Semantic Cache¶

This guide shows how to add semantic caching to a Google ADK agent so that near-duplicate prompts return a cached LLM response instead of making a new call.

For the concepts behind semantic caching, see Semantic Caching.

Option A: Self-hosted with RedisVL¶

Use RedisVLCacheProvider when you run your own Redis instance and want full control over the vectorizer and cache index.

Prerequisites¶

Redis 8.4+ running locally (see Redis setup).
pip install 'adk-redis[search]'

Setup¶

from google.adk import Agent
from redisvl.utils.vectorize import HFTextVectorizer

from adk_redis import (
    LLMResponseCache,
    LLMResponseCacheConfig,
    RedisVLCacheProvider,
    RedisVLCacheProviderConfig,
    create_llm_cache_callbacks,
)

# 1. Create a vectorizer (runs locally, no API key needed)
vectorizer = HFTextVectorizer(model="redis/langcache-embed-v1")

# 2. Create the cache provider
provider = RedisVLCacheProvider(
    config=RedisVLCacheProviderConfig(
        redis_url="redis://localhost:6379",
        name="my_cache",
        ttl=3600,
        distance_threshold=0.1,
    ),
    vectorizer=vectorizer,
)

# 3. Create the cache and wire callbacks into the agent
llm_cache = LLMResponseCache(
    provider=provider,
    config=LLMResponseCacheConfig(first_message_only=True),
)
before_cb, after_cb = create_llm_cache_callbacks(llm_cache)

agent = Agent(
    model="gemini-2.0-flash",
    name="cached_agent",
    before_model_callback=before_cb,
    after_model_callback=after_cb,
)

See the semantic_cache example for a runnable version.

Option B: Managed with LangCache¶

Use LangCacheProvider with Redis LangCache for a fully managed service. No local vectorizer or Redis instance needed; embeddings are handled server-side.

Prerequisites¶

A LangCache account and cache ID (sign up at redis.io/langcache).
pip install 'adk-redis[langcache]'

Setup¶

from google.adk import Agent

from adk_redis import (
    LLMResponseCache,
    LLMResponseCacheConfig,
    LangCacheProvider,
    LangCacheProviderConfig,
    create_llm_cache_callbacks,
)

provider = LangCacheProvider(
    config=LangCacheProviderConfig(
        cache_id="your-cache-id",
        api_key="your-api-key",
        server_url="https://aws-us-east-1.langcache.redis.io",
        ttl=3600,
    ),
)

llm_cache = LLMResponseCache(
    provider=provider,
    config=LLMResponseCacheConfig(first_message_only=False),
)
before_cb, after_cb = create_llm_cache_callbacks(llm_cache)

agent = Agent(
    model="gemini-2.0-flash",
    name="langcache_agent",
    before_model_callback=before_cb,
    after_model_callback=after_cb,
)

See the langcache_cache example for a runnable version.

Configuration options¶

Option	Provider	Default	Description
`distance_threshold`	Both	`0.1`	Max vector distance for a cache hit (lower = stricter)
`ttl`	Both	`None`	Time-to-live in seconds for cache entries
`name`	RedisVL	`llmcache`	Redis index name
`redis_url`	RedisVL	`redis://localhost:6379`	Redis connection string
`cache_id`	LangCache	Required	LangCache instance identifier
`api_key`	LangCache	Required	LangCache API key
`first_message_only`	Cache config	`True`	Only cache the first message per session