Tool Result Cache Middleware with LangChain Agents

Tool Result Cache Middleware with LangChain Agents#

This notebook demonstrates how to use ToolResultCacheMiddleware with LangChain agents using the standard create_agent pattern. The middleware caches tool execution results, reducing latency and costs for expensive operations.

Key Features#

Tool metadata support: Use LangChain’s native tool.metadata = {"cacheable": True/False}
Config-based fallback: Configure default caching behavior via cacheable_tools / excluded_tools
Semantic matching: Match similar tool calls by arguments
TTL support: Automatic cache expiration
API mode independent: Tool caching works identically regardless of LLM response format

Cacheability Rules#

Tool metadata takes precedence: If a tool has metadata={"cacheable": False}, it won’t be cached regardless of config
Config as fallback: When no metadata is set, cacheable_tools and excluded_tools config options apply
Safe defaults: Tools with temporal/non-deterministic results should set metadata={"cacheable": False}

Use Cases#

Cache: Deterministic calculations, static lookups, expensive but stable API calls
Don’t cache: Random number generators, real-time data (stock prices, weather), database queries on changing data

Two API Modes#

Tool caching operates at the tool-call level, independent of how the LLM formats its response. Both Chat Completions (string content) and Responses API (block content) work identically:

model_default = ChatOpenAI(model="gpt-4o-mini") — string content
model_responses_api = ChatOpenAI(model="gpt-4o-mini", use_responses_api=True) — block content

Prerequisites#

Redis 8.0+ or Redis Stack (with RedisJSON and RediSearch)
OpenAI API key

Setup#

Install required packages and set API keys.

%%capture --no-stderr
# When running via docker-compose, the local library is already installed via editable mount.
# Only install from PyPI if not already available.
try:
    import langgraph.middleware.redis
    print("langgraph-checkpoint-redis already installed")
except ImportError:
    %pip install -U langgraph-checkpoint-redis

%pip install -U langchain langchain-openai sentence-transformers

import getpass
import os

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("OPENAI_API_KEY")

REDIS_URL = os.environ.get("REDIS_URL", "redis://redis:6379")

Two-Model Setup and Tools#

We create two model instances to demonstrate that tool caching works identically regardless of the LLM response format.

import uuid

from langchain_openai import ChatOpenAI

# Default mode: content is a plain string
model_default = ChatOpenAI(model="gpt-4o-mini")

# Responses API mode: content is a list of blocks with embedded IDs
# Used by Azure OpenAI and advanced features (reasoning, annotations)
model_responses_api = ChatOpenAI(model="gpt-4o-mini", use_responses_api=True)

print("Models created:")
print("- model_default: Chat Completions (string content)")
print("- model_responses_api: Responses API (list-of-blocks content)")

Models created:
- model_default: Chat Completions (string content)
- model_responses_api: Responses API (list-of-blocks content)

def format_content(content, max_len=200):
    """Extract readable text from AI message content (handles both API modes)."""
    if isinstance(content, str):
        text = content
    elif isinstance(content, list):
        parts = []
        for block in content:
            if isinstance(block, dict):
                parts.append(block.get("text", ""))
            elif isinstance(block, str):
                parts.append(block)
        text = " ".join(parts)
    else:
        text = str(content)
    if max_len and len(text) > max_len:
        return text[:max_len] + "..."
    return text


def inspect_response(result, label=""):
    """Show the structure and content of an AI response."""
    ai_msg = result["messages"][-1]
    print(f"\n--- {label} ---")
    print(f"Content type: {type(ai_msg.content).__name__}")
    if isinstance(ai_msg.content, list):
        print(f"Number of content blocks: {len(ai_msg.content)}")
        for i, block in enumerate(ai_msg.content):
            if isinstance(block, dict):
                print(f"  Block {i}: type={block.get('type')}, has_id={'id' in block}")
    print(f"Response: {format_content(ai_msg.content)}")
    cached = ai_msg.additional_kwargs.get("cached", False)
    print(f"Cached: {cached}")

import ast
import operator as op
import random
import time

from langchain_core.tools import tool

# Safe math evaluator - no arbitrary code execution
SAFE_OPS = {
    ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul,
    ast.Div: op.truediv, ast.Pow: op.pow, ast.USub: op.neg,
}

def _eval_node(node):
    if isinstance(node, ast.Constant):
        return node.value
    elif isinstance(node, ast.BinOp) and type(node.op) in SAFE_OPS:
        return SAFE_OPS[type(node.op)](_eval_node(node.left), _eval_node(node.right))
    elif isinstance(node, ast.UnaryOp) and type(node.op) in SAFE_OPS:
        return SAFE_OPS[type(node.op)](_eval_node(node.operand))
    raise ValueError("Unsupported expression")

def safe_eval(expr: str) -> float:
    return _eval_node(ast.parse(expr, mode='eval').body)

# Track actual tool executions
tool_execution_count = {"search": 0, "calculate": 0, "random_number": 0, "get_stock_price": 0}


@tool
def search(query: str) -> str:
    """Search the web for information. Results are relatively stable."""
    tool_execution_count["search"] += 1
    time.sleep(1.0)  # Simulate API call
    return f"Search results for '{query}': Found 10 relevant articles about {query}."


# Mark as cacheable via metadata (deterministic operation)
search.metadata = {"cacheable": True}


@tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression. Always deterministic!"""
    tool_execution_count["calculate"] += 1
    time.sleep(0.5)  # Simulate computation
    try:
        result = safe_eval(expression)
        return f"{expression} = {result}"
    except Exception as e:
        return f"Error: {str(e)}"


# Mark as cacheable via metadata (deterministic operation)
calculate.metadata = {"cacheable": True}


@tool
def random_number(max_value: int) -> str:
    """Generate a random number. Non-deterministic - should NOT be cached!"""
    tool_execution_count["random_number"] += 1
    return f"Random number: {random.randint(1, max_value)}"


# Mark as NOT cacheable via metadata (non-deterministic)
random_number.metadata = {"cacheable": False}


@tool
def get_stock_price(symbol: str) -> str:
    """Get current stock price. Temporal data - should NOT be cached!"""
    tool_execution_count["get_stock_price"] += 1
    # Simulate real-time price
    price = 150.00 + random.uniform(-5, 5)
    return f"{symbol}: ${price:.2f}"


# Mark as NOT cacheable via metadata (temporal data)
get_stock_price.metadata = {"cacheable": False}


tools = [search, calculate, random_number, get_stock_price]

print("Tools defined with cacheability metadata:")
for t in tools:
    cacheable = t.metadata.get("cacheable", "not set") if t.metadata else "not set"
    print(f"  - {t.name}: cacheable={cacheable}")

Tools defined with cacheability metadata:
  - search: cacheable=True
  - calculate: cacheable=True
  - random_number: cacheable=False
  - get_stock_price: cacheable=False

Tool Caching with Default Mode#

First, let’s demonstrate tool caching with the standard Chat Completions API.

from langchain.agents import create_agent

from langgraph.middleware.redis import ToolCacheConfig, ToolResultCacheMiddleware

# Unique cache name to avoid collisions
tool_cache_name_default = f"demo_tool_cache_default_{uuid.uuid4().hex[:8]}"

# Create the tool cache middleware
tool_cache_default = ToolResultCacheMiddleware(
    ToolCacheConfig(
        redis_url=REDIS_URL,
        name=tool_cache_name_default,
        distance_threshold=0.1,  # Strict matching for tools
        ttl_seconds=1800,  # 30 minutes
    )
)

print("ToolResultCacheMiddleware created!")
print(f"- Cache name: {tool_cache_name_default}")
print("\nCacheability is determined by tool.metadata['cacheable']:")
print("  - search: cacheable=True (stable results)")
print("  - calculate: cacheable=True (deterministic)")
print("  - random_number: cacheable=False (non-deterministic)")
print("  - get_stock_price: cacheable=False (temporal data)")

ToolResultCacheMiddleware created!
- Cache name: demo_tool_cache_default_c2b26184

Cacheability is determined by tool.metadata['cacheable']:
  - search: cacheable=True (stable results)
  - calculate: cacheable=True (deterministic)
  - random_number: cacheable=False (non-deterministic)
  - get_stock_price: cacheable=False (temporal data)

# Create the agent with tool cache middleware + default model
agent_default = create_agent(
    model=model_default,
    tools=tools,
    middleware=[tool_cache_default],
)

print("Agent created with ToolResultCacheMiddleware (default mode)!")

Agent created with ToolResultCacheMiddleware (default mode)!

Demonstrating Tool Cache Behavior#

Let’s make queries that trigger tool calls and observe how caching works.

Important: We use await agent.ainvoke() because the middleware is async-first.

from langchain_core.messages import HumanMessage

# Reset counters
tool_execution_count = {"search": 0, "calculate": 0, "random_number": 0, "get_stock_price": 0}

# First search query - tool will execute
print("Query 1: 'Search for Python tutorials'")
print("=" * 50)

start = time.time()
result1 = await agent_default.ainvoke({"messages": [HumanMessage(content="Search for Python tutorials")]})
elapsed1 = time.time() - start

print(f"Response: {result1['messages'][-1].content[:150]}...")
print(f"Time: {elapsed1:.2f}s")
print(f"Tool executions: {tool_execution_count}")

Query 1: 'Search for Python tutorials'
==================================================

Response: I found several tutorials about Python. Here are some notable mentions:

1. **Python.org** - Official Python documentation and tutorials.
2. **Codecad...
Time: 4.44s
Tool executions: {'search': 1, 'calculate': 0, 'random_number': 0, 'get_stock_price': 0}

# Similar search query - may or may not hit cache depending on tool arguments
print("\nQuery 2: 'Find Python tutorials online'")
print("=" * 50)

start = time.time()
result2 = await agent_default.ainvoke({"messages": [HumanMessage(content="Find Python tutorials online")]})
elapsed2 = time.time() - start

print(f"Response: {result2['messages'][-1].content[:150]}...")
print(f"Time: {elapsed2:.2f}s")
print(f"Tool executions: {tool_execution_count}")
print("\nNote: Tool cache matches on tool ARGUMENTS, not user messages.")
print("If the LLM generates different search args, it will be a cache miss.")

Query 2: 'Find Python tutorials online'
==================================================

Response: Here are some online resources for Python tutorials:

1. **Codecademy - Learn Python**: An interactive platform that offers free and paid Python cours...
Time: 10.56s
Tool executions: {'search': 2, 'calculate': 0, 'random_number': 0, 'get_stock_price': 0}

Note: Tool cache matches on tool ARGUMENTS, not user messages.
If the LLM generates different search args, it will be a cache miss.

# Calculate query - will be cached
print("\nQuery 3: 'Calculate 25 * 4 + 100'")
print("=" * 50)

start = time.time()
result3 = await agent_default.ainvoke({"messages": [HumanMessage(content="Calculate 25 * 4 + 100")]})
elapsed3 = time.time() - start

print(f"Response: {result3['messages'][-1].content}")
print(f"Time: {elapsed3:.2f}s")
print(f"Tool executions: {tool_execution_count}")

Query 3: 'Calculate 25 * 4 + 100'
==================================================

Response: The result of \( 25 \times 4 + 100 \) is \( 200 \).
Time: 3.08s
Tool executions: {'search': 2, 'calculate': 1, 'random_number': 0, 'get_stock_price': 0}

Tool Caching with Responses API Mode#

Tool caching operates at the tool-call level — it intercepts tool invocations and caches their results. This is completely independent of how the LLM formats its response.

Let’s verify that the same caching behavior works identically with the Responses API.

# Create a separate tool cache for Responses API mode
tool_cache_name_responses = f"demo_tool_cache_responses_{uuid.uuid4().hex[:8]}"

tool_cache_responses = ToolResultCacheMiddleware(
    ToolCacheConfig(
        redis_url=REDIS_URL,
        name=tool_cache_name_responses,
        distance_threshold=0.1,
        ttl_seconds=1800,
    )
)

agent_responses = create_agent(
    model=model_responses_api,
    tools=tools,
    middleware=[tool_cache_responses],
)

# Reset counters
tool_execution_count = {"search": 0, "calculate": 0, "random_number": 0, "get_stock_price": 0}

print("Agent created with ToolResultCacheMiddleware (Responses API mode)!")

Agent created with ToolResultCacheMiddleware (Responses API mode)!

# Search with Responses API mode
print("Query 1 (Responses API): 'Search for machine learning resources'")
print("=" * 50)

start = time.time()
result_resp_1 = await agent_responses.ainvoke(
    {"messages": [HumanMessage(content="Search for machine learning resources")]}
)
elapsed_resp_1 = time.time() - start

print(f"Time: {elapsed_resp_1:.2f}s")
print(f"Tool executions: {tool_execution_count}")
inspect_response(result_resp_1, label="Search (Responses API)")

Query 1 (Responses API): 'Search for machine learning resources'
==================================================

Time: 9.74s
Tool executions: {'search': 1, 'calculate': 0, 'random_number': 0, 'get_stock_price': 0}

--- Search (Responses API) ---
Content type: list
Number of content blocks: 1
  Block 0: type=text, has_id=True
Response: Here are some useful machine learning resources I found:

1. **Coursera** - Offers various courses on machine learning from universities like Stanford and deeplearning.ai.
2. **edX** - Provides access...
Cached: False

# Calculate with Responses API mode
print("\nQuery 2 (Responses API): 'What is 15 * 8 + 20?'")
print("=" * 50)

start = time.time()
result_resp_2 = await agent_responses.ainvoke(
    {"messages": [HumanMessage(content="What is 15 * 8 + 20?")]}
)
elapsed_resp_2 = time.time() - start

print(f"Time: {elapsed_resp_2:.2f}s")
print(f"Tool executions: {tool_execution_count}")
inspect_response(result_resp_2, label="Calculate (Responses API)")

print("\nTool caching works identically regardless of LLM response format!")

Query 2 (Responses API): 'What is 15 * 8 + 20?'
==================================================

Time: 3.50s
Tool executions: {'search': 1, 'calculate': 1, 'random_number': 0, 'get_stock_price': 0}

--- Calculate (Responses API) ---
Content type: list
Number of content blocks: 1
  Block 0: type=text, has_id=True
Response: The result of \( 15 \times 8 + 20 \) is \( 140 \).
Cached: False

Tool caching works identically regardless of LLM response format!

Non-Cacheable Tools#

Tools marked with metadata={"cacheable": False} execute every time, regardless of API mode.

# Non-cacheable tools - should execute every time
print("Testing non-cacheable tools")
print("=" * 50)

# Random number - marked cacheable=False
print("\nRandom numbers (cacheable=False):")
for i in range(3):
    result = await agent_default.ainvoke({"messages": [HumanMessage(content="Generate a random number up to 100")]})
    print(f"  Attempt {i + 1}: {result['messages'][-1].content}")

# Stock price - marked cacheable=False (temporal)
print("\nStock prices (cacheable=False - temporal data):")
for i in range(2):
    result = await agent_default.ainvoke({"messages": [HumanMessage(content="What is the stock price of AAPL?")]})
    print(f"  Attempt {i + 1}: {result['messages'][-1].content}")

print(f"\nTool executions: {tool_execution_count}")
print("\nNote: Tools with metadata={'cacheable': False} are never cached!")

Testing non-cacheable tools
==================================================

Random numbers (cacheable=False):

  Attempt 1: The random number generated is 14.

  Attempt 2: The generated random number is 65.

  Attempt 3: The generated random number is 92.

Stock prices (cacheable=False - temporal data):

  Attempt 1: The current stock price of AAPL (Apple Inc.) is $146.25.

  Attempt 2: The stock price of AAPL is $146.57.

Tool executions: {'search': 1, 'calculate': 1, 'random_number': 3, 'get_stock_price': 2}

Note: Tools with metadata={'cacheable': False} are never cached!

# Summary
print("\n" + "=" * 50)
print("SUMMARY")
print("=" * 50)
print(f"Total tool executions: {tool_execution_count}")
print("\nCacheable tools (metadata={'cacheable': True}):")
print(f"  - search: {tool_execution_count['search']} executions")
print(f"  - calculate: {tool_execution_count['calculate']} executions")
print("\nNon-cacheable tools (metadata={'cacheable': False}):")
print(f"  - random_number: {tool_execution_count['random_number']} executions (never cached)")
print(f"  - get_stock_price: {tool_execution_count['get_stock_price']} executions (never cached)")
print("\nKey takeaways:")
print("  1. Use tool.metadata={'cacheable': True/False} to control caching")
print("  2. Tool caching works identically with both Chat Completions and Responses API modes")

==================================================
SUMMARY
==================================================
Total tool executions: {'search': 1, 'calculate': 1, 'random_number': 3, 'get_stock_price': 2}

Cacheable tools (metadata={'cacheable': True}):
  - search: 1 executions
  - calculate: 1 executions

Non-cacheable tools (metadata={'cacheable': False}):
  - random_number: 3 executions (never cached)
  - get_stock_price: 2 executions (never cached)

Key takeaways:
  1. Use tool.metadata={'cacheable': True/False} to control caching
  2. Tool caching works identically with both Chat Completions and Responses API modes

Cleanup#

# Close all middleware to release Redis connections
await tool_cache_default.aclose()
await tool_cache_responses.aclose()
print("Middleware closed.")
print("Demo complete!")

Middleware closed.
Demo complete!

Version