Conversation Memory Middleware with LangChain Agents

Conversation Memory Middleware with LangChain Agents#

This notebook demonstrates how to use ConversationMemoryMiddleware with LangChain agents using the standard create_agent pattern. The middleware provides semantic long-term memory by retrieving relevant past conversations.

Key Features#

Semantic retrieval: Find relevant past messages by meaning
Session management: Organize memory by session tags
Context injection: Automatically add relevant history to prompts
Configurable retrieval: Control how many past messages to retrieve
API mode independent: Memory works with both string and block-based content formats

Two API Modes#

The conversation memory middleware stores and retrieves messages regardless of how the LLM formats its responses:

Default (Chat Completions): AIMessage.content is a plain string
Responses API: AIMessage.content is a list of content blocks

Both modes are demonstrated side-by-side with different user personas.

Use Cases#

Long-running conversations that exceed context limits
Multi-session agents that remember past interactions
Customer support bots with user history

Prerequisites#

Redis 8.0+ or Redis Stack (with RedisJSON and RediSearch)
OpenAI API key

Note on Async Usage#

The Redis middleware uses async methods internally. When using with create_agent, you must use await agent.ainvoke() rather than agent.invoke().

Setup#

Install required packages and set API keys.

%%capture --no-stderr
# When running via docker-compose, the local library is already installed via editable mount.
# Only install from PyPI if not already available.
try:
    import langgraph.middleware.redis
    print("langgraph-checkpoint-redis already installed")
except ImportError:
    %pip install -U langgraph-checkpoint-redis

%pip install -U langchain langchain-openai sentence-transformers

import getpass
import os

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("OPENAI_API_KEY")

REDIS_URL = os.environ.get("REDIS_URL", "redis://redis:6379")

Two-Model Setup and Tools#

We create two model instances to demonstrate memory with both API modes.

import uuid

from langchain_openai import ChatOpenAI

# Default mode: content is a plain string
model_default = ChatOpenAI(model="gpt-4o-mini")

# Responses API mode: content is a list of blocks with embedded IDs
# Used by Azure OpenAI and advanced features (reasoning, annotations)
model_responses_api = ChatOpenAI(model="gpt-4o-mini", use_responses_api=True)

print("Models created:")
print("- model_default: Chat Completions (string content)")
print("- model_responses_api: Responses API (list-of-blocks content)")

Models created:
- model_default: Chat Completions (string content)
- model_responses_api: Responses API (list-of-blocks content)

def format_content(content, max_len=200):
    """Extract readable text from AI message content (handles both API modes)."""
    if isinstance(content, str):
        text = content
    elif isinstance(content, list):
        parts = []
        for block in content:
            if isinstance(block, dict):
                parts.append(block.get("text", ""))
            elif isinstance(block, str):
                parts.append(block)
        text = " ".join(parts)
    else:
        text = str(content)
    if max_len and len(text) > max_len:
        return text[:max_len] + "..."
    return text


def inspect_response(result, label=""):
    """Show the structure and content of an AI response."""
    ai_msg = result["messages"][-1]
    print(f"\n--- {label} ---")
    print(f"Content type: {type(ai_msg.content).__name__}")
    if isinstance(ai_msg.content, list):
        print(f"Number of content blocks: {len(ai_msg.content)}")
        for i, block in enumerate(ai_msg.content):
            if isinstance(block, dict):
                print(f"  Block {i}: type={block.get('type')}, has_id={'id' in block}")
    print(f"Response: {format_content(ai_msg.content)}")
    cached = ai_msg.additional_kwargs.get("cached", False)
    print(f"Cached: {cached}")

from langchain_core.tools import tool


# Define some tools
@tool
def get_user_preferences(category: str) -> str:
    """Get user preferences for a category."""
    preferences = {
        "food": "Italian cuisine, vegetarian options",
        "music": "Jazz, Classical, Lo-fi",
        "movies": "Sci-fi, Documentaries",
    }
    return preferences.get(category.lower(), "No preferences stored")


@tool
def save_preference(category: str, preference: str) -> str:
    """Save a user preference."""
    return f"Saved preference for {category}: {preference}"


tools = [get_user_preferences, save_preference]

Conversation Memory with Default Mode#

First, let’s demonstrate memory with the standard Chat Completions API using Alice’s session.

from langchain.agents import create_agent
from langgraph.middleware.redis import ConversationMemoryMiddleware, ConversationMemoryConfig

# Unique memory name to avoid collisions
memory_name = f"demo_conversation_memory_{uuid.uuid4().hex[:8]}"

# Create the conversation memory middleware for Alice
memory_middleware_alice = ConversationMemoryMiddleware(
    ConversationMemoryConfig(
        redis_url=REDIS_URL,
        name=memory_name,
        session_tag="user_123",  # Identify the user/session
        top_k=3,  # Retrieve top 3 relevant past messages
        distance_threshold=0.7,  # Max cosine distance for relevant messages
    )
)

print("ConversationMemoryMiddleware created for Alice!")
print(f"- Memory name: {memory_name}")
print("- Session: user_123")
print("- Retrieves top 3 relevant past messages")

ConversationMemoryMiddleware created for Alice!
- Memory name: demo_conversation_memory_29e43dd7
- Session: user_123
- Retrieves top 3 relevant past messages

# Create the agent with conversation memory middleware + default model
agent_alice = create_agent(
    model=model_default,
    tools=tools,
    middleware=[memory_middleware_alice],
)

print("Agent created with ConversationMemoryMiddleware (default mode)!")

Agent created with ConversationMemoryMiddleware (default mode)!

Multi-Turn Conversation with Alice#

Let’s have a multi-turn conversation where the agent should remember previous exchanges.

Important: We use await agent.ainvoke() because the middleware is async-first.

from langchain_core.messages import HumanMessage

# First message - establishing context
print("Turn 1: Introducing myself")
print("="*50)

result1 = await agent_alice.ainvoke({
    "messages": [HumanMessage(content="Hi! My name is Alice and I'm a software engineer.")]
})
print(f"User: Hi! My name is Alice and I'm a software engineer.")
print(f"Agent: {result1['messages'][-1].content}")

Turn 1: Introducing myself
==================================================

Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.

MPNetModel LOAD REPORT from: sentence-transformers/all-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.

User: Hi! My name is Alice and I'm a software engineer.
Agent: Hi Alice! It's great to meet you. As a software engineer, what kind of projects are you currently working on?

# Second message - adding more context
print("\nTurn 2: Sharing interests")
print("="*50)

result2 = await agent_alice.ainvoke({
    "messages": [HumanMessage(content="I'm really interested in machine learning and I work with Python.")]
})
print(f"User: I'm really interested in machine learning and I work with Python.")
print(f"Agent: {result2['messages'][-1].content}")

Turn 2: Sharing interests
==================================================

User: I'm really interested in machine learning and I work with Python.
Agent: That's great to hear, Alice! Python is a fantastic choice for machine learning, given its rich ecosystem of libraries like TensorFlow, PyTorch, and scikit-learn. Are you working on any specific machine learning projects or learning any particular concepts right now?

# Third message - the middleware should recall ML/Python interests from Turn 2
print("\nTurn 3: Asking for recommendations (requires memory of interests)")
print("="*50)

result3 = await agent_alice.ainvoke({
    "messages": [HumanMessage(content="What Python libraries would be most useful for me?")]
})
print(f"User: What Python libraries would be most useful for me?")
print(f"Agent: {result3['messages'][-1].content[:500]}")
print("\nThe middleware should inject ML/Python context so the agent knows your interests.")

Turn 3: Asking for recommendations (requires memory of interests)
==================================================

User: What Python libraries would be most useful for me?
Agent: Since you're interested in machine learning and working with Python, here are some essential libraries that you might find useful:

1. **NumPy**: This library provides support for large multidimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

2. **Pandas**: Ideal for data manipulation and analysis, Pandas makes it easy to work with structured data.

3. **Matplotlib**: A plotting library for creating static, animated, and interactive vis

The middleware should inject ML/Python context so the agent knows your interests.

# Fourth message - testing long-term recall
print("\nTurn 4: Testing recall")
print("="*50)

result4 = await agent_alice.ainvoke({
    "messages": [HumanMessage(content="What's my name and what do I do for work?")]
})
print(f"User: What's my name and what do I do for work?")
print(f"Agent: {result4['messages'][-1].content}")
print("\nThe middleware retrieved relevant past context to answer this!")

Turn 4: Testing recall
==================================================

User: What's my name and what do I do for work?
Agent: Your name is Alice, and you are a software engineer.

The middleware retrieved relevant past context to answer this!

Conversation Memory with Responses API Mode#

Now let’s demonstrate the same memory behavior with the Responses API. We use a new persona (Carol, an embedded systems engineer) to show that memory works correctly with block-based content.

# Create memory middleware for Carol with Responses API
memory_middleware_carol = ConversationMemoryMiddleware(
    ConversationMemoryConfig(
        redis_url=REDIS_URL,
        name=memory_name,  # Same memory store, different session
        session_tag="user_789",  # Carol's session
        top_k=3,
        distance_threshold=0.7,
    )
)

agent_carol = create_agent(
    model=model_responses_api,
    tools=tools,
    middleware=[memory_middleware_carol],
)

print("Agent created for Carol (Responses API mode)!")
print("- Session: user_789")

Agent created for Carol (Responses API mode)!
- Session: user_789

# Carol Turn 1: Introduction
print("Carol Turn 1: Introduction")
print("="*50)

result_c1 = await agent_carol.ainvoke({
    "messages": [HumanMessage(content="Hi! I'm Carol, an embedded systems engineer. I work with C and Rust.")]
})
print(f"Carol: Hi! I'm Carol, an embedded systems engineer. I work with C and Rust.")
print(f"Agent: {format_content(result_c1['messages'][-1].content)}")
inspect_response(result_c1, label="Carol Turn 1")

Carol Turn 1: Introduction
==================================================

MPNetModel LOAD REPORT from: sentence-transformers/all-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.

Carol: Hi! I'm Carol, an embedded systems engineer. I work with C and Rust.
Agent: Hi Carol! It's great to meet you. Embedded systems engineering sounds fascinating, especially working with C and Rust. Are there specific projects or technologies you're currently focused on?

--- Carol Turn 1 ---
Content type: list
Number of content blocks: 1
  Block 0: type=text, has_id=True
Response: Hi Carol! It's great to meet you. Embedded systems engineering sounds fascinating, especially working with C and Rust. Are there specific projects or technologies you're currently focused on?
Cached: False

# Carol Turn 2: Share interests
print("\nCarol Turn 2: Sharing interests")
print("="*50)

result_c2 = await agent_carol.ainvoke({
    "messages": [HumanMessage(content="I'm interested in RTOS, bare-metal programming, and IoT protocols.")]
})
print(f"Carol: I'm interested in RTOS, bare-metal programming, and IoT protocols.")
print(f"Agent: {format_content(result_c2['messages'][-1].content)}")
inspect_response(result_c2, label="Carol Turn 2")

Carol Turn 2: Sharing interests
==================================================

Carol: I'm interested in RTOS, bare-metal programming, and IoT protocols.
Agent: That sounds like an exciting area to work in! Real-Time Operating Systems (RTOS) and bare-metal programming offer great control over hardware, while IoT protocols are essential for communication in co...

--- Carol Turn 2 ---
Content type: list
Number of content blocks: 1
  Block 0: type=text, has_id=True
Response: That sounds like an exciting area to work in! Real-Time Operating Systems (RTOS) and bare-metal programming offer great control over hardware, while IoT protocols are essential for communication in co...
Cached: False

# Carol Turn 3: Test recall (middleware should inject embedded systems context)
print("\nCarol Turn 3: Testing recall with Responses API")
print("="*50)

result_c3 = await agent_carol.ainvoke({
    "messages": [HumanMessage(content="What tools or frameworks would be useful for my work?")]
})
print(f"Carol: What tools or frameworks would be useful for my work?")
print(f"Agent: {format_content(result_c3['messages'][-1].content, max_len=500)}")
inspect_response(result_c3, label="Carol Turn 3 (recall test)")
print("\nThe middleware retrieved Carol's embedded systems context to personalize recommendations!")

Carol Turn 3: Testing recall with Responses API
==================================================

Carol: What tools or frameworks would be useful for my work?
Agent: Here are some tools and frameworks that might be beneficial for your work in embedded systems engineering with C and Rust:

### For C:
1. **GCC (GNU Compiler Collection)**: Standard compiler suite for C, widely used for embedded systems.
2. **Keil uVision**: A development environment for ARM and other microcontrollers.
3. **IAR Embedded Workbench**: Comprehensive IDE with optimization features for C.
4. **PlatformIO**: A cross-platform ecosystem for IoT development that supports multiple embedde...

--- Carol Turn 3 (recall test) ---
Content type: list
Number of content blocks: 1
  Block 0: type=text, has_id=True
Response: Here are some tools and frameworks that might be beneficial for your work in embedded systems engineering with C and Rust:

### For C:
1. **GCC (GNU Compiler Collection)**: Standard compiler suite for...
Cached: False

The middleware retrieved Carol's embedded systems context to personalize recommendations!

# Carol Turn 4: Verify name and role recall
print("\nCarol Turn 4: Verify identity recall")
print("="*50)

result_c4 = await agent_carol.ainvoke({
    "messages": [HumanMessage(content="What's my name and what languages do I use?")]
})
print(f"Carol: What's my name and what languages do I use?")
print(f"Agent: {format_content(result_c4['messages'][-1].content)}")
inspect_response(result_c4, label="Carol Turn 4 (identity recall)")
print("\nMemory works correctly with Responses API content blocks!")

Carol Turn 4: Verify identity recall
==================================================

Carol: What's my name and what languages do I use?
Agent: Your name is Carol, and you work with C and Rust.

--- Carol Turn 4 (identity recall) ---
Content type: list
Number of content blocks: 1
  Block 0: type=text, has_id=True
Response: Your name is Carol, and you work with C and Rust.
Cached: False

Memory works correctly with Responses API content blocks!

Session Isolation#

Different sessions maintain separate memory spaces. Let’s verify that Alice’s agent (default mode) and Bob’s agent (using a different model mode) don’t share memory.

# Create a new middleware for a different session (Bob uses Responses API)
memory_middleware_bob = ConversationMemoryMiddleware(
    ConversationMemoryConfig(
        redis_url=REDIS_URL,
        name=memory_name,
        session_tag="user_456",  # Different user
        top_k=3,
        distance_threshold=0.7,
    )
)

# Bob uses Responses API mode — session isolation works across API modes
agent_bob = create_agent(
    model=model_responses_api,
    tools=tools,
    middleware=[memory_middleware_bob],
)

print("New session created for user_456 (Bob, Responses API mode)")
print("="*50)

result_bob = await agent_bob.ainvoke({
    "messages": [HumanMessage(content="Hi, I'm Bob and I'm a data scientist!")]
})
print(f"Bob: Hi, I'm Bob and I'm a data scientist!")
print(f"Agent: {format_content(result_bob['messages'][-1].content)}")
inspect_response(result_bob, label="Bob's session (Responses API)")

New session created for user_456 (Bob, Responses API mode)
==================================================

MPNetModel LOAD REPORT from: sentence-transformers/all-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.

Bob: Hi, I'm Bob and I'm a data scientist!
Agent: Hi Bob! It's great to meet you. As a data scientist, what kind of projects are you currently working on?

--- Bob's session (Responses API) ---
Content type: list
Number of content blocks: 1
  Block 0: type=text, has_id=True
Response: Hi Bob! It's great to meet you. As a data scientist, what kind of projects are you currently working on?
Cached: False

# Verify sessions are isolated - ask Bob's agent about Alice
print("\nVerifying session isolation:")
print("="*50)

result_isolation = await agent_bob.ainvoke({
    "messages": [HumanMessage(content="Do you know anyone named Alice?")]
})
print(f"User: Do you know anyone named Alice?")
print(f"Agent: {format_content(result_isolation['messages'][-1].content)}")
inspect_response(result_isolation, label="Isolation test")
print("\nBob's session should NOT know about Alice from the other session.")
print("Session isolation works across different API modes!")

Verifying session isolation:
==================================================

User: Do you know anyone named Alice?
Agent: I don't have personal connections or knowledge of individuals. However, "Alice" is a common name and has been used in various cultural references, like "Alice in Wonderland." Is there something specif...

--- Isolation test ---
Content type: list
Number of content blocks: 1
  Block 0: type=text, has_id=True
Response: I don't have personal connections or knowledge of individuals. However, "Alice" is a common name and has been used in various cultural references, like "Alice in Wonderland." Is there something specif...
Cached: False

Bob's session should NOT know about Alice from the other session.
Session isolation works across different API modes!

Cleanup#

# Close all middleware to release Redis connections
await memory_middleware_alice.aclose()
await memory_middleware_carol.aclose()
await memory_middleware_bob.aclose()
print("Middleware closed.")
print("Demo complete!")

Middleware closed.
Demo complete!

Version