Middleware#
redis-openai-agents ships an around-style middleware layer for the OpenAI
Agents SDK, modelled on LangChain’s AgentMiddleware protocol. A middleware
wraps each model call, letting you observe, mutate, short-circuit, or replace
the LLM invocation without touching the agent.
When to use middleware#
The Agents SDK already exposes RunHooks for lifecycle observation. Use
middleware when you need to:
Short-circuit an LLM call with a cached, routed, or fallback response.
Mutate the request before it reaches the model (inject context, redact).
Mutate the response on the way back (augment, sanitize).
Compose multiple cross-cutting concerns (cache + router + rate limit).
Prerequisites#
docker run -d --name redis -p 6379:6379 redis:8
import os
os.environ.setdefault("OPENAI_API_KEY", "your-api-key-here")
REDIS_URL = os.environ.get("REDIS_URL", "redis://localhost:6379")
The protocol#
AgentMiddleware is any object with an async awrap_model_call(request, handler)
coroutine. request is a ModelRequest dataclass; handler is the next link
in the chain (call it to delegate, skip it to short-circuit).
from redis_openai_agents import ModelRequest
class RedactingMiddleware:
"""Strip API keys from user input before the LLM sees them."""
async def awrap_model_call(self, request: ModelRequest, handler):
# Mutate the request (the dataclass is mutable).
if isinstance(request.input, str):
request.input = request.input.replace("sk-SECRET", "[REDACTED]")
return await handler(request)
MiddlewareStack#
MiddlewareStack composes middlewares and implements the SDK Model interface,
so you pass it directly to Agent(model=...).
from redis_openai_agents import MiddlewareStack
Short-circuiting with SemanticRouterMiddleware#
Cheapest path: match the input against a set of known intents and return a canned response without calling the LLM at all.
from redis_openai_agents import Route, SemanticRouter
from redis_openai_agents.middleware import SemanticRouterMiddleware
router = SemanticRouter(
name="user_guide_router",
routes=[
Route(name="greeting", references=["hello", "hi", "hey"], distance_threshold=0.3),
Route(name="thanks", references=["thank you", "thanks"], distance_threshold=0.3),
],
redis_url=REDIS_URL,
)
router_mw = SemanticRouterMiddleware(
router=router,
responses={
"greeting": "Hello! How can I help?",
"thanks": "You're welcome!",
},
)
When a route matches, the middleware returns a pre-built ModelResponse
(plain strings are auto-wrapped). When no route matches, or the matched route
has no mapped response, the request falls through to the next link in the chain.
Caching with SemanticCacheMiddleware#
Cache LLM responses keyed by the semantic similarity of the input. The cache is
skipped automatically when tools, handoffs, or output_schema are present,
since those requests are typically non-deterministic.
from redis_openai_agents import SemanticCache
from redis_openai_agents.middleware import SemanticCacheMiddleware
cache = SemanticCache(
redis_url=REDIS_URL,
similarity_threshold=0.92,
name="user_guide_mw_cache",
ttl=3600,
)
cache_mw = SemanticCacheMiddleware(cache=cache)
The default serializer pickle+base64-encodes the response, which handles
ModelResponse end to end. Supply your own serializer / deserializer if
you need structured or security-sensitive storage.
Composing the stack#
Middleware order matters: outer-to-inner for the request path, inner-to-outer for the response path. Put cheap short-circuiters first.
# Would normally wrap a real OpenAI model. Uses a stub here for illustration.
class StubModel:
async def get_response(self, **kwargs):
from redis_openai_agents.middleware import text_response
return text_response("stub answer")
def stream_response(self, **kwargs):
raise NotImplementedError
async def close(self): pass
stack = MiddlewareStack(
model=StubModel(),
middlewares=[router_mw, cache_mw], # router first, then cache, then LLM
)
print("Stack composition:")
for i, mw in enumerate(stack.middlewares):
print(f" {i + 1}. {type(mw).__name__}")
print(f" ↓")
print(f" inner: {type(stack.inner).__name__}")
Using the stack with an Agent#
Drop-in replacement for the model parameter.
from agents import Agent, Runner
from agents.models.openai_responses import OpenAIResponsesModel
from openai import AsyncOpenAI
base_model = OpenAIResponsesModel(model="gpt-4o-mini", openai_client=AsyncOpenAI())
stack = MiddlewareStack(model=base_model, middlewares=[router_mw, cache_mw])
agent = Agent(name="assistant", instructions="Be concise.", model=stack)
result = await Runner.run(agent, "hello")
See examples/16-middleware.ipynb for a runnable end-to-end walkthrough with
real OpenAI calls and timing measurements.
Injecting history with ConversationMemoryMiddleware#
ConversationMemoryMiddleware uses RedisVL’s SemanticMessageHistory to
retrieve past messages relevant to the current user input and prepends them
to the request. After the LLM replies, the middleware stores both the user
turn and the reply back in history for future retrieval.
from redisvl.extensions.message_history import SemanticMessageHistory
from redis_openai_agents.middleware import ConversationMemoryMiddleware
history = SemanticMessageHistory(
name="user_guide_memory",
session_tag="user-42",
redis_url=REDIS_URL,
distance_threshold=0.3,
)
memory_mw = ConversationMemoryMiddleware(
history=history,
session_tag="user-42",
top_k=5,
)
Use with the stack like any other middleware:
stack = MiddlewareStack(
model=base_model,
middlewares=[router_mw, memory_mw, cache_mw],
)
This middleware mutates request.input - it does not short-circuit. Put it
before the cache so cached responses benefit from the same injected history.
text_response helper#
When writing your own short-circuiting middleware, use text_response to
build a ModelResponse from a plain string:
from redis_openai_agents.middleware import text_response
response = text_response("I always answer the same way.")
print(type(response).__name__)
print(response.output[0].content[0].text)
Summary#
Protocol: any object with
awrap_model_call(request, handler).Stack:
MiddlewareStack(model=..., middlewares=[...])implements the SDKModelinterface.Shipped middlewares:
SemanticRouterMiddleware,SemanticCacheMiddleware.Helpers:
text_response(text)for hand-building short-circuit responses.Order matters: place cheapest/most-likely-to-short-circuit middlewares first.