Semantic Caching¶

adk-redis provides semantic caching that skips LLM calls when a user sends a prompt that is similar (or identical) to one already answered. This reduces latency and cost without changing agent behavior.

Quick Reference¶

Feature	Details
What it caches	LLM responses keyed by prompt similarity
Similarity	Vector distance between prompt embeddings
Providers	`RedisVLCacheProvider` (self-hosted) or `LangCacheProvider` (managed)
TTL	Configurable per-entry expiration
Integration	ADK `before_model_callback` / `after_model_callback` hooks

How It Works¶

flowchart TD
    U([User prompt]) --> BC[before_model_callback<br/>embed prompt, search cache]
    BC --> D{Cache hit?}
    D -->|Yes| CR([Return cached response<br/>no LLM call])
    D -->|No| LLM[Call LLM]
    LLM --> AC[after_model_callback<br/>store response in cache]
    AC --> R([Return LLM response])

    subgraph Cache [Redis Cache]
        SE[(Semantic index<br/>prompt embeddings)]
    end

    BC <--> Cache
    AC --> Cache

Before the LLM is called, LLMResponseCache embeds the prompt and searches for a semantically similar entry in the cache.
If the distance is below the configured threshold, the cached response is returned immediately (no LLM call).
If no match is found, the LLM runs normally and the response is stored in the cache for future hits.

Two Provider Options¶

Self-Hosted (RedisVL)¶

Use RedisVLCacheProvider when you run your own Redis instance and want full control over the vectorizer and cache index.

from redisvl.utils.vectorize import HFTextVectorizer

from adk_redis.cache import (
    LLMResponseCache,
    LLMResponseCacheConfig,
    RedisVLCacheProvider,
    RedisVLCacheProviderConfig,
)

vectorizer = HFTextVectorizer(model="redis/langcache-embed-v1")

provider = RedisVLCacheProvider(
    config=RedisVLCacheProviderConfig(
        redis_url="redis://localhost:6379",
        name="my_cache",
        ttl=3600,
        distance_threshold=0.1,
    ),
    vectorizer=vectorizer,
)

Requirements: pip install 'adk-redis[search]' and a running Redis instance.

Managed (LangCache)¶

Use LangCacheProvider with Redis LangCache for a fully managed service. No local vectorizer needed; embeddings are handled server-side.

from adk_redis.cache import (
    LLMResponseCache,
    LLMResponseCacheConfig,
    LangCacheProvider,
    LangCacheProviderConfig,
)

provider = LangCacheProvider(
    config=LangCacheProviderConfig(
        cache_id="your-cache-id",
        api_key="your-api-key",
        server_url="https://aws-us-east-1.langcache.redis.io",
        ttl=3600,
    ),
)

Requirements: pip install 'adk-redis[langcache]' and a LangCache account.

Wiring Into an Agent¶

Both providers use the same LLMResponseCache wrapper, which produces ADK-compatible callbacks:

from adk_redis.cache import create_llm_cache_callbacks

llm_cache = LLMResponseCache(
    provider=provider,
    config=LLMResponseCacheConfig(
        first_message_only=True,   # only cache the first user message
        include_app_name=True,     # scope cache keys by app
        include_user_id=True,      # scope cache keys by user
    ),
)

before_cb, after_cb = create_llm_cache_callbacks(llm_cache)

agent = Agent(
    model="gemini-2.0-flash",
    name="my_agent",
    before_model_callback=before_cb,
    after_model_callback=after_cb,
)

When to Use Which¶

Provider	Use when
RedisVL	You already run Redis, want local embeddings, need full control over cache index schema.
LangCache	You want a managed service with no infrastructure, server-side embeddings, and built-in analytics.

Configuration Options¶

Option	Provider	Default	Description
`distance_threshold`	Both	`0.1`	Max vector distance for a cache hit (lower = stricter)
`ttl`	Both	`None`	Time-to-live in seconds for cache entries
`name`	RedisVL	`llmcache`	Redis index name for the cache
`redis_url`	RedisVL	`redis://localhost:6379`	Redis connection string
`cache_id`	LangCache	Required	LangCache instance identifier
`api_key`	LangCache	Required	LangCache API key
`use_exact_search`	LangCache	`True`	Enable exact (hash) matching in addition to semantic
`use_semantic_search`	LangCache	`True`	Enable semantic (vector) matching

Next Steps¶

Semantic cache example for a runnable self-hosted demo.
LangCache example for a runnable managed demo.
Sessions + Memory services and Sessions + Memory MCP for the other Redis-backed features.
ADK runtime options for adk web, adk run, and adk api_server.