Semantic Caching¶
adk-redis provides semantic caching that skips LLM calls when a user sends a prompt that is similar (or identical) to one already answered. This reduces latency and cost without changing agent behavior.
Quick Reference¶
| Feature | Details |
|---|---|
| What it caches | LLM responses keyed by prompt similarity |
| Similarity | Vector distance between prompt embeddings |
| Providers | RedisVLCacheProvider (self-hosted) or LangCacheProvider (managed) |
| TTL | Configurable per-entry expiration |
| Integration | ADK before_model_callback / after_model_callback hooks |
How It Works¶
flowchart TD
U([User prompt]) --> BC[before_model_callback<br/>embed prompt, search cache]
BC --> D{Cache hit?}
D -->|Yes| CR([Return cached response<br/>no LLM call])
D -->|No| LLM[Call LLM]
LLM --> AC[after_model_callback<br/>store response in cache]
AC --> R([Return LLM response])
subgraph Cache [Redis Cache]
SE[(Semantic index<br/>prompt embeddings)]
end
BC <--> Cache
AC --> Cache
- Before the LLM is called,
LLMResponseCacheembeds the prompt and searches for a semantically similar entry in the cache. - If the distance is below the configured threshold, the cached response is returned immediately (no LLM call).
- If no match is found, the LLM runs normally and the response is stored in the cache for future hits.
Two Provider Options¶
Self-Hosted (RedisVL)¶
Use RedisVLCacheProvider when you run your own Redis instance and want full control over the vectorizer and cache index.
from redisvl.utils.vectorize import HFTextVectorizer
from adk_redis.cache import (
LLMResponseCache,
LLMResponseCacheConfig,
RedisVLCacheProvider,
RedisVLCacheProviderConfig,
)
vectorizer = HFTextVectorizer(model="redis/langcache-embed-v1")
provider = RedisVLCacheProvider(
config=RedisVLCacheProviderConfig(
redis_url="redis://localhost:6379",
name="my_cache",
ttl=3600,
distance_threshold=0.1,
),
vectorizer=vectorizer,
)
Requirements: pip install 'adk-redis[search]' and a running Redis instance.
Managed (LangCache)¶
Use LangCacheProvider with Redis LangCache for a fully managed service. No local vectorizer needed; embeddings are handled server-side.
from adk_redis.cache import (
LLMResponseCache,
LLMResponseCacheConfig,
LangCacheProvider,
LangCacheProviderConfig,
)
provider = LangCacheProvider(
config=LangCacheProviderConfig(
cache_id="your-cache-id",
api_key="your-api-key",
server_url="https://aws-us-east-1.langcache.redis.io",
ttl=3600,
),
)
Requirements: pip install 'adk-redis[langcache]' and a LangCache account.
Wiring Into an Agent¶
Both providers use the same LLMResponseCache wrapper, which produces ADK-compatible callbacks:
from adk_redis.cache import create_llm_cache_callbacks
llm_cache = LLMResponseCache(
provider=provider,
config=LLMResponseCacheConfig(
first_message_only=True, # only cache the first user message
include_app_name=True, # scope cache keys by app
include_user_id=True, # scope cache keys by user
),
)
before_cb, after_cb = create_llm_cache_callbacks(llm_cache)
agent = Agent(
model="gemini-2.0-flash",
name="my_agent",
before_model_callback=before_cb,
after_model_callback=after_cb,
)
When to Use Which¶
| Provider | Use when |
|---|---|
| RedisVL | You already run Redis, want local embeddings, need full control over cache index schema. |
| LangCache | You want a managed service with no infrastructure, server-side embeddings, and built-in analytics. |
Configuration Options¶
| Option | Provider | Default | Description |
|---|---|---|---|
distance_threshold |
Both | 0.1 |
Max vector distance for a cache hit (lower = stricter) |
ttl |
Both | None |
Time-to-live in seconds for cache entries |
name |
RedisVL | llmcache |
Redis index name for the cache |
redis_url |
RedisVL | redis://localhost:6379 |
Redis connection string |
cache_id |
LangCache | Required | LangCache instance identifier |
api_key |
LangCache | Required | LangCache API key |
use_exact_search |
LangCache | True |
Enable exact (hash) matching in addition to semantic |
use_semantic_search |
LangCache | True |
Enable semantic (vector) matching |
Next Steps¶
- Semantic cache example for a runnable self-hosted demo.
- LangCache example for a runnable managed demo.
- Sessions + Memory services and Sessions + Memory MCP for the other Redis-backed features.
- ADK runtime options for
adk web,adk run, andadk api_server.