Redis OpenAI Agents#

Production-ready Redis integrations for the OpenAI Agents SDK. Replace 5+ separate systems with a single Redis deployment.

Session Storage

Persistent, distributed conversation storage with AgentSession and JSONSession.

Semantic Caching

Reduce LLM costs by 25%+ with two-level semantic caching.

Semantic Routing

Route queries to agents using vector similarity - no LLM calls required.

Vector Search

Build RAG applications with Redis vector search (HNSW).

Token Streaming

Reliable, replayable token streaming via Redis Streams.

Observability

Built-in metrics with RedisTimeSeries and Prometheus.

Middleware

Around-style middleware for the Agents SDK: cache, router, composition.

Installation#

Install redis-openai-agents into your Python (>=3.10) environment using pip:

pip install redis-openai-agents

Then make sure to have Redis accessible with Search & Query features enabled on Redis Cloud or locally in docker. The official redis:8 image includes Search, JSON, Time Series, and Bloom filters:

docker run -d --name redis -p 6379:6379 redis:8

For a GUI, run Redis Insight separately: docker run -d --name redisinsight -p 5540:5540 redis/redisinsight:latest.

Quick Start#

Agent Sessions#

Replace SQLite sessions with Redis for persistent, distributed conversation storage:

from agents import Agent, Runner
from redis_openai_agents import AgentSession

# Create a session
session = AgentSession.create(
    user_id="user_123",
    redis_url="redis://localhost:6379"
)

# Define your agent
agent = Agent(name="assistant", instructions="You are a helpful assistant.")

# Run the agent
result = await Runner.run(agent, input="Hello!")

# Store the conversation
session.store_agent_result(result)

Semantic Caching#

Reduce LLM costs by caching responses for similar queries:

from redis_openai_agents import SemanticCache

cache = SemanticCache(
    redis_url="redis://localhost:6379",
    similarity_threshold=0.9,
    ttl=3600
)

# Check cache before calling LLM
result = cache.get(query="What is the capital of France?")
if result:
    print(f"Cache hit: {result.response}")
else:
    response = "Paris is the capital of France."
    cache.set(query="What is the capital of France?", response=response)

Semantic Routing#

Route queries to the appropriate agent using vector similarity:

from redis_openai_agents import SemanticRouter, Route

router = SemanticRouter(
    name="support-router",
    redis_url="redis://localhost:6379",
    routes=[
        Route(
            name="billing",
            references=["payment issue", "invoice", "refund request"],
            metadata={"agent": "billing_agent"},
            distance_threshold=0.3
        ),
        Route(
            name="technical",
            references=["bug report", "error message", "not working"],
            metadata={"agent": "tech_agent"},
            distance_threshold=0.3
        ),
    ]
)

# Route a query (vector lookup, not LLM call)
match = router.route("I need help with my payment")
print(f"Route to: {match.name}")  # "billing"

Why Redis OpenAI Agents?#

Challenge	Without Redis	With Redis OpenAI Agents
Session Storage	SQLite (single-node)	Distributed Redis sessions
Caching	None or external service	Built-in semantic cache
Vector Search	Pinecone, Qdrant ($70+/mo)	Redis Vector Search (free)
Streaming	Custom WebSocket code	Redis Streams (reliable)
Metrics	Prometheus + Grafana setup	Built-in TimeSeries
Total Services	5+ separate systems	1 Redis deployment

Version