Semantic Cache¶
This guide shows how to add semantic caching to a Google ADK agent so that near-duplicate prompts return a cached LLM response instead of making a new call.
For the concepts behind semantic caching, see Semantic Caching.
Option A: Self-hosted with RedisVL¶
Use RedisVLCacheProvider when you run your own Redis instance and want full
control over the vectorizer and cache index.
Prerequisites¶
- Redis 8.4+ running locally (see Redis setup).
pip install 'adk-redis[search]'
Setup¶
from google.adk import Agent
from redisvl.utils.vectorize import HFTextVectorizer
from adk_redis import (
LLMResponseCache,
LLMResponseCacheConfig,
RedisVLCacheProvider,
RedisVLCacheProviderConfig,
create_llm_cache_callbacks,
)
# 1. Create a vectorizer (runs locally, no API key needed)
vectorizer = HFTextVectorizer(model="redis/langcache-embed-v1")
# 2. Create the cache provider
provider = RedisVLCacheProvider(
config=RedisVLCacheProviderConfig(
redis_url="redis://localhost:6379",
name="my_cache",
ttl=3600,
distance_threshold=0.1,
),
vectorizer=vectorizer,
)
# 3. Create the cache and wire callbacks into the agent
llm_cache = LLMResponseCache(
provider=provider,
config=LLMResponseCacheConfig(first_message_only=True),
)
before_cb, after_cb = create_llm_cache_callbacks(llm_cache)
agent = Agent(
model="gemini-2.0-flash",
name="cached_agent",
before_model_callback=before_cb,
after_model_callback=after_cb,
)
See the semantic_cache example for a runnable version.
Option B: Managed with LangCache¶
Use LangCacheProvider with
Redis LangCache for a fully managed service. No
local vectorizer or Redis instance needed; embeddings are handled server-side.
Prerequisites¶
- A LangCache account and cache ID (sign up at redis.io/langcache).
pip install 'adk-redis[langcache]'
Setup¶
from google.adk import Agent
from adk_redis import (
LLMResponseCache,
LLMResponseCacheConfig,
LangCacheProvider,
LangCacheProviderConfig,
create_llm_cache_callbacks,
)
provider = LangCacheProvider(
config=LangCacheProviderConfig(
cache_id="your-cache-id",
api_key="your-api-key",
server_url="https://aws-us-east-1.langcache.redis.io",
ttl=3600,
),
)
llm_cache = LLMResponseCache(
provider=provider,
config=LLMResponseCacheConfig(first_message_only=False),
)
before_cb, after_cb = create_llm_cache_callbacks(llm_cache)
agent = Agent(
model="gemini-2.0-flash",
name="langcache_agent",
before_model_callback=before_cb,
after_model_callback=after_cb,
)
See the langcache_cache example for a runnable version.
Configuration options¶
| Option | Provider | Default | Description |
|---|---|---|---|
distance_threshold |
Both | 0.1 |
Max vector distance for a cache hit (lower = stricter) |
ttl |
Both | None |
Time-to-live in seconds for cache entries |
name |
RedisVL | llmcache |
Redis index name |
redis_url |
RedisVL | redis://localhost:6379 |
Redis connection string |
cache_id |
LangCache | Required | LangCache instance identifier |
api_key |
LangCache | Required | LangCache API key |
first_message_only |
Cache config | True |
Only cache the first message per session |