Middleware#

redis-openai-agents ships an around-style middleware layer for the OpenAI Agents SDK, modelled on LangChain’s AgentMiddleware protocol. A middleware wraps each model call, letting you observe, mutate, short-circuit, or replace the LLM invocation without touching the agent.

When to use middleware#

The Agents SDK already exposes RunHooks for lifecycle observation. Use middleware when you need to:

Short-circuit an LLM call with a cached, routed, or fallback response.
Mutate the request before it reaches the model (inject context, redact).
Mutate the response on the way back (augment, sanitize).
Compose multiple cross-cutting concerns (cache + router + rate limit).

Prerequisites#

docker run -d --name redis -p 6379:6379 redis:8

import os
os.environ.setdefault("OPENAI_API_KEY", "your-api-key-here")
REDIS_URL = os.environ.get("REDIS_URL", "redis://localhost:6379")

The protocol#

AgentMiddleware is any object with an async awrap_model_call(request, handler) coroutine. request is a ModelRequest dataclass; handler is the next link in the chain (call it to delegate, skip it to short-circuit).

from redis_openai_agents import ModelRequest


class RedactingMiddleware:
    """Strip API keys from user input before the LLM sees them."""

    async def awrap_model_call(self, request: ModelRequest, handler):
        # Mutate the request (the dataclass is mutable).
        if isinstance(request.input, str):
            request.input = request.input.replace("sk-SECRET", "[REDACTED]")
        return await handler(request)

MiddlewareStack#

MiddlewareStack composes middlewares and implements the SDK Model interface, so you pass it directly to Agent(model=...).

from redis_openai_agents import MiddlewareStack

Short-circuiting with SemanticRouterMiddleware#

Cheapest path: match the input against a set of known intents and return a canned response without calling the LLM at all.

from redis_openai_agents import Route, SemanticRouter
from redis_openai_agents.middleware import SemanticRouterMiddleware

router = SemanticRouter(
    name="user_guide_router",
    routes=[
        Route(name="greeting", references=["hello", "hi", "hey"], distance_threshold=0.3),
        Route(name="thanks", references=["thank you", "thanks"], distance_threshold=0.3),
    ],
    redis_url=REDIS_URL,
)

router_mw = SemanticRouterMiddleware(
    router=router,
    responses={
        "greeting": "Hello! How can I help?",
        "thanks": "You're welcome!",
    },
)

When a route matches, the middleware returns a pre-built ModelResponse (plain strings are auto-wrapped). When no route matches, or the matched route has no mapped response, the request falls through to the next link in the chain.

Caching with SemanticCacheMiddleware#

Cache LLM responses keyed by the semantic similarity of the input. The cache is skipped automatically when tools, handoffs, or output_schema are present, since those requests are typically non-deterministic.

from redis_openai_agents import SemanticCache
from redis_openai_agents.middleware import SemanticCacheMiddleware

cache = SemanticCache(
    redis_url=REDIS_URL,
    similarity_threshold=0.92,
    name="user_guide_mw_cache",
    ttl=3600,
)
cache_mw = SemanticCacheMiddleware(cache=cache)

The default serializer pickle+base64-encodes the response, which handles ModelResponse end to end. Supply your own serializer / deserializer if you need structured or security-sensitive storage.

Composing the stack#

Middleware order matters: outer-to-inner for the request path, inner-to-outer for the response path. Put cheap short-circuiters first.

# Would normally wrap a real OpenAI model. Uses a stub here for illustration.
class StubModel:
    async def get_response(self, **kwargs):
        from redis_openai_agents.middleware import text_response
        return text_response("stub answer")

    def stream_response(self, **kwargs):
        raise NotImplementedError

    async def close(self): pass


stack = MiddlewareStack(
    model=StubModel(),
    middlewares=[router_mw, cache_mw],  # router first, then cache, then LLM
)

print("Stack composition:")
for i, mw in enumerate(stack.middlewares):
    print(f"  {i + 1}. {type(mw).__name__}")
print(f"  ↓")
print(f"  inner: {type(stack.inner).__name__}")

Using the stack with an Agent#

Drop-in replacement for the model parameter.

from agents import Agent, Runner
from agents.models.openai_responses import OpenAIResponsesModel
from openai import AsyncOpenAI

base_model = OpenAIResponsesModel(model="gpt-4o-mini", openai_client=AsyncOpenAI())
stack = MiddlewareStack(model=base_model, middlewares=[router_mw, cache_mw])
agent = Agent(name="assistant", instructions="Be concise.", model=stack)
result = await Runner.run(agent, "hello")

See examples/16-middleware.ipynb for a runnable end-to-end walkthrough with real OpenAI calls and timing measurements.

Injecting history with ConversationMemoryMiddleware#

ConversationMemoryMiddleware uses RedisVL’s SemanticMessageHistory to retrieve past messages relevant to the current user input and prepends them to the request. After the LLM replies, the middleware stores both the user turn and the reply back in history for future retrieval.

from redisvl.extensions.message_history import SemanticMessageHistory
from redis_openai_agents.middleware import ConversationMemoryMiddleware

history = SemanticMessageHistory(
    name="user_guide_memory",
    session_tag="user-42",
    redis_url=REDIS_URL,
    distance_threshold=0.3,
)

memory_mw = ConversationMemoryMiddleware(
    history=history,
    session_tag="user-42",
    top_k=5,
)

Use with the stack like any other middleware:

stack = MiddlewareStack(
    model=base_model,
    middlewares=[router_mw, memory_mw, cache_mw],
)

This middleware mutates request.input - it does not short-circuit. Put it before the cache so cached responses benefit from the same injected history.

text_response helper#

When writing your own short-circuiting middleware, use text_response to build a ModelResponse from a plain string:

from redis_openai_agents.middleware import text_response

response = text_response("I always answer the same way.")
print(type(response).__name__)
print(response.output[0].content[0].text)

Summary#

Protocol: any object with awrap_model_call(request, handler).
Stack: MiddlewareStack(model=..., middlewares=[...]) implements the SDK Model interface.
Shipped middlewares: SemanticRouterMiddleware, SemanticCacheMiddleware.
Helpers: text_response(text) for hand-building short-circuit responses.
Order matters: place cheapest/most-likely-to-short-circuit middlewares first.

Version