Chat Memory in Python
LLMs do not remember by default
Same as Spring AI: models are stateless. If you want memory, you send history each call.
So memory in Python is usually one of two paths:
- Manual message list (OpenAI SDK)
- Memory abstraction (LangChain)
Level 1: Manual memory (OpenAI SDK)
from openai import OpenAI
client = OpenAI()
messages = [
{"role": "system", "content": "You are a helpful assistant."}
]
def chat(user_text: str) -> str:
messages.append({"role": "user", "content": user_text})
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
answer = response.choices[0].message.content
messages.append({"role": "assistant", "content": answer})
return answer
chat("My name is Sam")
print(chat("What's my name?"))
# "Your name is Sam."
This is the cleanest way to understand memory. No abstraction, just message history.
Level 2: Conversation IDs (multiple users)
One global message list breaks in multi-user apps. Use per-user memory stores.
from collections import defaultdict
conversations = defaultdict(list)
def chat_with_memory(user_id: str, user_text: str) -> str:
if not conversations[user_id]:
conversations[user_id].append({"role": "system", "content": "You are helpful."})
conversations[user_id].append({"role": "user", "content": user_text})
response = client.chat.completions.create(
model="gpt-4o",
messages=conversations[user_id]
)
answer = response.choices[0].message.content
conversations[user_id].append({"role": "assistant", "content": answer})
return answer
Now each user has isolated memory.
Level 3: Sliding window memory
Full history grows forever (cost + latency). Keep recent messages only.
def trim_memory(history: list, max_messages: int = 20) -> list:
system = [m for m in history if m["role"] == "system"]
non_system = [m for m in history if m["role"] != "system"]
trimmed = non_system[-max_messages:]
return system[:1] + trimmed
conversations[user_id] = trim_memory(conversations[user_id], max_messages=12)
This is basically MessageWindowChatMemory, Python style.
Level 4: LangChain Memory
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain
llm = ChatOpenAI(model="gpt-4o")
memory = ConversationBufferWindowMemory(k=6) # keep last 6 turns
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=True
)
conversation.predict(input="My name is Sam")
print(conversation.predict(input="What's my name?"))
If you're already in LangChain, this reduces boilerplate fast.
Level 5: Persistent memory
For production, persist memory in Redis/Postgres/Mongo so restarts don't erase context.
Pattern is simple:
load_messages(conversation_id)- call model
save_messages(conversation_id, updated_messages)
What to Remember
- Memory is your responsibility unless framework handles it
- Use per-conversation IDs for multi-user apps
- Always trim history (token cost matters)
- Persist memory if conversation continuity matters across restarts