Conversation Memory Middleware with LangChain Agents#
This notebook demonstrates how to use ConversationMemoryMiddleware with LangChain agents using the standard create_agent pattern. The middleware provides semantic long-term memory by retrieving relevant past conversations.
Key Features#
Semantic retrieval: Find relevant past messages by meaning
Session management: Organize memory by session tags
Context injection: Automatically add relevant history to prompts
Configurable retrieval: Control how many past messages to retrieve
API mode independent: Memory works with both string and block-based content formats
Two API Modes#
The conversation memory middleware stores and retrieves messages regardless of how the LLM formats its responses:
Default (Chat Completions):
AIMessage.contentis a plain stringResponses API:
AIMessage.contentis a list of content blocks
Both modes are demonstrated side-by-side with different user personas.
Use Cases#
Long-running conversations that exceed context limits
Multi-session agents that remember past interactions
Customer support bots with user history
Prerequisites#
Redis 8.0+ or Redis Stack (with RedisJSON and RediSearch)
OpenAI API key
Note on Async Usage#
The Redis middleware uses async methods internally. When using with create_agent, you must use await agent.ainvoke() rather than agent.invoke().
Setup#
Install required packages and set API keys.
%%capture --no-stderr
# When running via docker-compose, the local library is already installed via editable mount.
# Only install from PyPI if not already available.
try:
import langgraph.middleware.redis
print("langgraph-checkpoint-redis already installed")
except ImportError:
%pip install -U langgraph-checkpoint-redis
%pip install -U langchain langchain-openai sentence-transformers
import getpass
import os
def _set_env(var: str):
if not os.environ.get(var):
os.environ[var] = getpass.getpass(f"{var}: ")
_set_env("OPENAI_API_KEY")
REDIS_URL = os.environ.get("REDIS_URL", "redis://redis:6379")
Two-Model Setup and Tools#
We create two model instances to demonstrate memory with both API modes.
import uuid
from langchain_openai import ChatOpenAI
# Default mode: content is a plain string
model_default = ChatOpenAI(model="gpt-4o-mini")
# Responses API mode: content is a list of blocks with embedded IDs
# Used by Azure OpenAI and advanced features (reasoning, annotations)
model_responses_api = ChatOpenAI(model="gpt-4o-mini", use_responses_api=True)
print("Models created:")
print("- model_default: Chat Completions (string content)")
print("- model_responses_api: Responses API (list-of-blocks content)")
Models created:
- model_default: Chat Completions (string content)
- model_responses_api: Responses API (list-of-blocks content)
def format_content(content, max_len=200):
"""Extract readable text from AI message content (handles both API modes)."""
if isinstance(content, str):
text = content
elif isinstance(content, list):
parts = []
for block in content:
if isinstance(block, dict):
parts.append(block.get("text", ""))
elif isinstance(block, str):
parts.append(block)
text = " ".join(parts)
else:
text = str(content)
if max_len and len(text) > max_len:
return text[:max_len] + "..."
return text
def inspect_response(result, label=""):
"""Show the structure and content of an AI response."""
ai_msg = result["messages"][-1]
print(f"\n--- {label} ---")
print(f"Content type: {type(ai_msg.content).__name__}")
if isinstance(ai_msg.content, list):
print(f"Number of content blocks: {len(ai_msg.content)}")
for i, block in enumerate(ai_msg.content):
if isinstance(block, dict):
print(f" Block {i}: type={block.get('type')}, has_id={'id' in block}")
print(f"Response: {format_content(ai_msg.content)}")
cached = ai_msg.additional_kwargs.get("cached", False)
print(f"Cached: {cached}")
from langchain_core.tools import tool
# Define some tools
@tool
def get_user_preferences(category: str) -> str:
"""Get user preferences for a category."""
preferences = {
"food": "Italian cuisine, vegetarian options",
"music": "Jazz, Classical, Lo-fi",
"movies": "Sci-fi, Documentaries",
}
return preferences.get(category.lower(), "No preferences stored")
@tool
def save_preference(category: str, preference: str) -> str:
"""Save a user preference."""
return f"Saved preference for {category}: {preference}"
tools = [get_user_preferences, save_preference]
Conversation Memory with Default Mode#
First, let’s demonstrate memory with the standard Chat Completions API using Alice’s session.
from langchain.agents import create_agent
from langgraph.middleware.redis import ConversationMemoryMiddleware, ConversationMemoryConfig
# Unique memory name to avoid collisions
memory_name = f"demo_conversation_memory_{uuid.uuid4().hex[:8]}"
# Create the conversation memory middleware for Alice
memory_middleware_alice = ConversationMemoryMiddleware(
ConversationMemoryConfig(
redis_url=REDIS_URL,
name=memory_name,
session_tag="user_123", # Identify the user/session
top_k=3, # Retrieve top 3 relevant past messages
distance_threshold=0.7, # Max cosine distance for relevant messages
)
)
print("ConversationMemoryMiddleware created for Alice!")
print(f"- Memory name: {memory_name}")
print("- Session: user_123")
print("- Retrieves top 3 relevant past messages")
ConversationMemoryMiddleware created for Alice!
- Memory name: demo_conversation_memory_29e43dd7
- Session: user_123
- Retrieves top 3 relevant past messages
# Create the agent with conversation memory middleware + default model
agent_alice = create_agent(
model=model_default,
tools=tools,
middleware=[memory_middleware_alice],
)
print("Agent created with ConversationMemoryMiddleware (default mode)!")
Agent created with ConversationMemoryMiddleware (default mode)!
Multi-Turn Conversation with Alice#
Let’s have a multi-turn conversation where the agent should remember previous exchanges.
Important: We use await agent.ainvoke() because the middleware is async-first.
from langchain_core.messages import HumanMessage
# First message - establishing context
print("Turn 1: Introducing myself")
print("="*50)
result1 = await agent_alice.ainvoke({
"messages": [HumanMessage(content="Hi! My name is Alice and I'm a software engineer.")]
})
print(f"User: Hi! My name is Alice and I'm a software engineer.")
print(f"Agent: {result1['messages'][-1].content}")
Turn 1: Introducing myself
==================================================
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
MPNetModel LOAD REPORT from: sentence-transformers/all-mpnet-base-v2
Key | Status | |
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED | |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
User: Hi! My name is Alice and I'm a software engineer.
Agent: Hi Alice! It's great to meet you. As a software engineer, what kind of projects are you currently working on?
# Second message - adding more context
print("\nTurn 2: Sharing interests")
print("="*50)
result2 = await agent_alice.ainvoke({
"messages": [HumanMessage(content="I'm really interested in machine learning and I work with Python.")]
})
print(f"User: I'm really interested in machine learning and I work with Python.")
print(f"Agent: {result2['messages'][-1].content}")
Turn 2: Sharing interests
==================================================
User: I'm really interested in machine learning and I work with Python.
Agent: That's great to hear, Alice! Python is a fantastic choice for machine learning, given its rich ecosystem of libraries like TensorFlow, PyTorch, and scikit-learn. Are you working on any specific machine learning projects or learning any particular concepts right now?
# Third message - the middleware should recall ML/Python interests from Turn 2
print("\nTurn 3: Asking for recommendations (requires memory of interests)")
print("="*50)
result3 = await agent_alice.ainvoke({
"messages": [HumanMessage(content="What Python libraries would be most useful for me?")]
})
print(f"User: What Python libraries would be most useful for me?")
print(f"Agent: {result3['messages'][-1].content[:500]}")
print("\nThe middleware should inject ML/Python context so the agent knows your interests.")
Turn 3: Asking for recommendations (requires memory of interests)
==================================================
User: What Python libraries would be most useful for me?
Agent: Since you're interested in machine learning and working with Python, here are some essential libraries that you might find useful:
1. **NumPy**: This library provides support for large multidimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.
2. **Pandas**: Ideal for data manipulation and analysis, Pandas makes it easy to work with structured data.
3. **Matplotlib**: A plotting library for creating static, animated, and interactive vis
The middleware should inject ML/Python context so the agent knows your interests.
# Fourth message - testing long-term recall
print("\nTurn 4: Testing recall")
print("="*50)
result4 = await agent_alice.ainvoke({
"messages": [HumanMessage(content="What's my name and what do I do for work?")]
})
print(f"User: What's my name and what do I do for work?")
print(f"Agent: {result4['messages'][-1].content}")
print("\nThe middleware retrieved relevant past context to answer this!")
Turn 4: Testing recall
==================================================
User: What's my name and what do I do for work?
Agent: Your name is Alice, and you are a software engineer.
The middleware retrieved relevant past context to answer this!
Conversation Memory with Responses API Mode#
Now let’s demonstrate the same memory behavior with the Responses API. We use a new persona (Carol, an embedded systems engineer) to show that memory works correctly with block-based content.
# Create memory middleware for Carol with Responses API
memory_middleware_carol = ConversationMemoryMiddleware(
ConversationMemoryConfig(
redis_url=REDIS_URL,
name=memory_name, # Same memory store, different session
session_tag="user_789", # Carol's session
top_k=3,
distance_threshold=0.7,
)
)
agent_carol = create_agent(
model=model_responses_api,
tools=tools,
middleware=[memory_middleware_carol],
)
print("Agent created for Carol (Responses API mode)!")
print("- Session: user_789")
Agent created for Carol (Responses API mode)!
- Session: user_789
# Carol Turn 1: Introduction
print("Carol Turn 1: Introduction")
print("="*50)
result_c1 = await agent_carol.ainvoke({
"messages": [HumanMessage(content="Hi! I'm Carol, an embedded systems engineer. I work with C and Rust.")]
})
print(f"Carol: Hi! I'm Carol, an embedded systems engineer. I work with C and Rust.")
print(f"Agent: {format_content(result_c1['messages'][-1].content)}")
inspect_response(result_c1, label="Carol Turn 1")
Carol Turn 1: Introduction
==================================================
MPNetModel LOAD REPORT from: sentence-transformers/all-mpnet-base-v2
Key | Status | |
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED | |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Carol: Hi! I'm Carol, an embedded systems engineer. I work with C and Rust.
Agent: Hi Carol! It's great to meet you. Embedded systems engineering sounds fascinating, especially working with C and Rust. Are there specific projects or technologies you're currently focused on?
--- Carol Turn 1 ---
Content type: list
Number of content blocks: 1
Block 0: type=text, has_id=True
Response: Hi Carol! It's great to meet you. Embedded systems engineering sounds fascinating, especially working with C and Rust. Are there specific projects or technologies you're currently focused on?
Cached: False
# Carol Turn 2: Share interests
print("\nCarol Turn 2: Sharing interests")
print("="*50)
result_c2 = await agent_carol.ainvoke({
"messages": [HumanMessage(content="I'm interested in RTOS, bare-metal programming, and IoT protocols.")]
})
print(f"Carol: I'm interested in RTOS, bare-metal programming, and IoT protocols.")
print(f"Agent: {format_content(result_c2['messages'][-1].content)}")
inspect_response(result_c2, label="Carol Turn 2")
Carol Turn 2: Sharing interests
==================================================
Carol: I'm interested in RTOS, bare-metal programming, and IoT protocols.
Agent: That sounds like an exciting area to work in! Real-Time Operating Systems (RTOS) and bare-metal programming offer great control over hardware, while IoT protocols are essential for communication in co...
--- Carol Turn 2 ---
Content type: list
Number of content blocks: 1
Block 0: type=text, has_id=True
Response: That sounds like an exciting area to work in! Real-Time Operating Systems (RTOS) and bare-metal programming offer great control over hardware, while IoT protocols are essential for communication in co...
Cached: False
# Carol Turn 3: Test recall (middleware should inject embedded systems context)
print("\nCarol Turn 3: Testing recall with Responses API")
print("="*50)
result_c3 = await agent_carol.ainvoke({
"messages": [HumanMessage(content="What tools or frameworks would be useful for my work?")]
})
print(f"Carol: What tools or frameworks would be useful for my work?")
print(f"Agent: {format_content(result_c3['messages'][-1].content, max_len=500)}")
inspect_response(result_c3, label="Carol Turn 3 (recall test)")
print("\nThe middleware retrieved Carol's embedded systems context to personalize recommendations!")
Carol Turn 3: Testing recall with Responses API
==================================================
Carol: What tools or frameworks would be useful for my work?
Agent: Here are some tools and frameworks that might be beneficial for your work in embedded systems engineering with C and Rust:
### For C:
1. **GCC (GNU Compiler Collection)**: Standard compiler suite for C, widely used for embedded systems.
2. **Keil uVision**: A development environment for ARM and other microcontrollers.
3. **IAR Embedded Workbench**: Comprehensive IDE with optimization features for C.
4. **PlatformIO**: A cross-platform ecosystem for IoT development that supports multiple embedde...
--- Carol Turn 3 (recall test) ---
Content type: list
Number of content blocks: 1
Block 0: type=text, has_id=True
Response: Here are some tools and frameworks that might be beneficial for your work in embedded systems engineering with C and Rust:
### For C:
1. **GCC (GNU Compiler Collection)**: Standard compiler suite for...
Cached: False
The middleware retrieved Carol's embedded systems context to personalize recommendations!
# Carol Turn 4: Verify name and role recall
print("\nCarol Turn 4: Verify identity recall")
print("="*50)
result_c4 = await agent_carol.ainvoke({
"messages": [HumanMessage(content="What's my name and what languages do I use?")]
})
print(f"Carol: What's my name and what languages do I use?")
print(f"Agent: {format_content(result_c4['messages'][-1].content)}")
inspect_response(result_c4, label="Carol Turn 4 (identity recall)")
print("\nMemory works correctly with Responses API content blocks!")
Carol Turn 4: Verify identity recall
==================================================
Carol: What's my name and what languages do I use?
Agent: Your name is Carol, and you work with C and Rust.
--- Carol Turn 4 (identity recall) ---
Content type: list
Number of content blocks: 1
Block 0: type=text, has_id=True
Response: Your name is Carol, and you work with C and Rust.
Cached: False
Memory works correctly with Responses API content blocks!
Session Isolation#
Different sessions maintain separate memory spaces. Let’s verify that Alice’s agent (default mode) and Bob’s agent (using a different model mode) don’t share memory.
# Create a new middleware for a different session (Bob uses Responses API)
memory_middleware_bob = ConversationMemoryMiddleware(
ConversationMemoryConfig(
redis_url=REDIS_URL,
name=memory_name,
session_tag="user_456", # Different user
top_k=3,
distance_threshold=0.7,
)
)
# Bob uses Responses API mode — session isolation works across API modes
agent_bob = create_agent(
model=model_responses_api,
tools=tools,
middleware=[memory_middleware_bob],
)
print("New session created for user_456 (Bob, Responses API mode)")
print("="*50)
result_bob = await agent_bob.ainvoke({
"messages": [HumanMessage(content="Hi, I'm Bob and I'm a data scientist!")]
})
print(f"Bob: Hi, I'm Bob and I'm a data scientist!")
print(f"Agent: {format_content(result_bob['messages'][-1].content)}")
inspect_response(result_bob, label="Bob's session (Responses API)")
New session created for user_456 (Bob, Responses API mode)
==================================================
MPNetModel LOAD REPORT from: sentence-transformers/all-mpnet-base-v2
Key | Status | |
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED | |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Bob: Hi, I'm Bob and I'm a data scientist!
Agent: Hi Bob! It's great to meet you. As a data scientist, what kind of projects are you currently working on?
--- Bob's session (Responses API) ---
Content type: list
Number of content blocks: 1
Block 0: type=text, has_id=True
Response: Hi Bob! It's great to meet you. As a data scientist, what kind of projects are you currently working on?
Cached: False
# Verify sessions are isolated - ask Bob's agent about Alice
print("\nVerifying session isolation:")
print("="*50)
result_isolation = await agent_bob.ainvoke({
"messages": [HumanMessage(content="Do you know anyone named Alice?")]
})
print(f"User: Do you know anyone named Alice?")
print(f"Agent: {format_content(result_isolation['messages'][-1].content)}")
inspect_response(result_isolation, label="Isolation test")
print("\nBob's session should NOT know about Alice from the other session.")
print("Session isolation works across different API modes!")
Verifying session isolation:
==================================================
User: Do you know anyone named Alice?
Agent: I don't have personal connections or knowledge of individuals. However, "Alice" is a common name and has been used in various cultural references, like "Alice in Wonderland." Is there something specif...
--- Isolation test ---
Content type: list
Number of content blocks: 1
Block 0: type=text, has_id=True
Response: I don't have personal connections or knowledge of individuals. However, "Alice" is a common name and has been used in various cultural references, like "Alice in Wonderland." Is there something specif...
Cached: False
Bob's session should NOT know about Alice from the other session.
Session isolation works across different API modes!
Cleanup#
# Close all middleware to release Redis connections
await memory_middleware_alice.aclose()
await memory_middleware_carol.aclose()
await memory_middleware_bob.aclose()
print("Middleware closed.")
print("Demo complete!")
Middleware closed.
Demo complete!