Chat Memory

LLMs don't remember anything

LLMs are stateless. Tell it your name in one message, ask in the next — no idea. Every request starts fresh. If you want conversation context, you have to manage it.

That's what ChatMemory does. It stores and retrieves messages so the model can keep up.

Quick distinction:

Chat Memory — what the model retains to stay contextually aware (a sliding window)
Chat History — the entire conversation record (use Spring Data for this)

Quick Start

Spring AI auto-configures a ChatMemory bean. By default it uses in-memory storage with MessageWindowChatMemory (keeps last 20 messages).

@Autowired
ChatMemory chatMemory;

That's it — memory works. Now let's put it to use.

Mongo + sliding window

This config uses Mongo-backed memory with a sliding window of 25 messages.

ChatMemory chatMemory = MessageWindowChatMemory.builder()
    .chatMemoryRepository(chatMemoryRepository) // MongoChatMemoryRepository
    .maxMessages(25)
    .build();

return ChatClient.builder(chatModel)
    .defaultAdvisors(
        new SimpleLoggerAdvisor(),
        MessageChatMemoryAdvisor.builder(chatMemory).build()
    )
    .build();

This is a good default for demos and internal assistants.

Using memory with ChatClient

The easiest approach — use MessageChatMemoryAdvisor:

ChatMemory chatMemory = MessageWindowChatMemory.builder().build();

ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(MessageChatMemoryAdvisor.builder(chatMemory).build())
    .build();

Now when you make calls, pass a conversation ID:

String conversationId = "007";

chatClient.prompt()
    .user("My name is James Bond")
    .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, conversationId))
    .call()
    .content();

// Later...
String response = chatClient.prompt()
    .user("What is my name?")
    .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, conversationId))
    .call()
    .content();

// Response: "Your name is James Bond"

The advisor handles everything — storing messages, retrieving history, injecting it into the prompt.

Endpoint pattern

Endpoints can use an optional conversationId, falling back to a generated UUID:

private String resolveConversationId(String conversationId) {
    return StringUtils.hasText(conversationId)
        ? conversationId
        : UUID.randomUUID().toString();
}

String result = chatClient.prompt()
    .system(systemPrompt)
    .user(userPrompt)
    .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, resolveConversationId(conversationId)))
    .call()
    .content();

Simple and practical. Same endpoint works for one-shot and multi-turn usage.

Memory Window Size

By default, MessageWindowChatMemory keeps the last 20 messages. When it exceeds that, older messages get evicted (system messages are preserved).

MessageWindowChatMemory memory = MessageWindowChatMemory.builder()
    .maxMessages(10)  // keep last 10
    .build();

Three types of memory advisors

Spring AI gives you three flavors:

MessageChatMemoryAdvisor — injects history as messages. Most natural approach.
PromptChatMemoryAdvisor — appends history to the system prompt as text. Useful when you need more control.
VectorStoreChatMemoryAdvisor — stores history in a vector store. Retrieves semantically relevant messages instead of just the last N. Good for long conversations.

// Vector store approach
ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(
        VectorStoreChatMemoryAdvisor.builder(vectorStore).build()
    )
    .build();

Persistent storage

In-memory is fine for dev. Production needs persistence. Spring AI supports a bunch of backends:

JDBC (PostgreSQL, MySQL, SQL Server, Oracle)

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-chat-memory-repository-jdbc</artifactId>
</dependency>

@Autowired
JdbcChatMemoryRepository chatMemoryRepository;

ChatMemory chatMemory = MessageWindowChatMemory.builder()
    .chatMemoryRepository(chatMemoryRepository)
    .maxMessages(10)
    .build();

Schema initializes automatically. Control it with:

spring.ai.chat.memory.repository.jdbc.initialize-schema=always

MongoDB

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-chat-memory-repository-mongodb</artifactId>
</dependency>

@Autowired
MongoChatMemoryRepository chatMemoryRepository;

Supports TTL for automatic cleanup:

spring.ai.chat.memory.repository.mongo.ttl=2592000  # 30 days in seconds

Cosmos DB

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-chat-memory-repository-cosmos-db</artifactId>
</dependency>

Uses conversation ID as partition key — efficient at scale. Supports both key-based and Azure Identity authentication.

Cassandra

Good for durability and when you want time-to-live on messages (e.g., auto-delete after 3 years for compliance).

Neo4j

If you're already in the Neo4j ecosystem, this stores messages as graph nodes with relationships.

Using Memory Directly with ChatModel

If you're working at the ChatModel level instead of ChatClient, manage memory manually:

ChatMemory chatMemory = MessageWindowChatMemory.builder().build();
String conversationId = "007";

// First interaction
UserMessage msg1 = new UserMessage("My name is James Bond");
chatMemory.add(conversationId, msg1);
ChatResponse response1 = chatModel.call(new Prompt(chatMemory.get(conversationId)));
chatMemory.add(conversationId, response1.getResult().getOutput());

// Second interaction
UserMessage msg2 = new UserMessage("What is my name?");
chatMemory.add(conversationId, msg2);
ChatResponse response2 = chatModel.call(new Prompt(chatMemory.get(conversationId)));
// response2 contains "James Bond"

More boilerplate, but you get full control over what goes in and out of memory.

What to Remember

Use advisors (MessageChatMemoryAdvisor) unless you need fine-grained control
Set a reasonable window size — 20 messages is the default, adjust based on your token budget
Use persistent storage in production — in-memory is for development only
Conversation IDs matter — they're how you separate different users/sessions
Chat Memory ≠ Chat History — memory is a sliding window for the model, history is the full record
Use generated UUID fallback when conversationId is missing — safer for API consumers