aregmi.net
Resume

Managing conversation memory with plain OpenAI SDK and LangChain memory abstractions

chat-memory conversation openai langchain state python

Chat Memory in Python

LLMs do not remember by default

Same as Spring AI: models are stateless. If you want memory, you send history each call.

So memory in Python is usually one of two paths:

  1. Manual message list (OpenAI SDK)
  2. Memory abstraction (LangChain)

Level 1: Manual memory (OpenAI SDK)

from openai import OpenAI

client = OpenAI()

messages = [
    {"role": "system", "content": "You are a helpful assistant."}
]

def chat(user_text: str) -> str:
    messages.append({"role": "user", "content": user_text})

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    )

    answer = response.choices[0].message.content
    messages.append({"role": "assistant", "content": answer})
    return answer

chat("My name is Sam")
print(chat("What's my name?"))
# "Your name is Sam."

This is the cleanest way to understand memory. No abstraction, just message history.

Level 2: Conversation IDs (multiple users)

One global message list breaks in multi-user apps. Use per-user memory stores.

from collections import defaultdict

conversations = defaultdict(list)

def chat_with_memory(user_id: str, user_text: str) -> str:
    if not conversations[user_id]:
        conversations[user_id].append({"role": "system", "content": "You are helpful."})

    conversations[user_id].append({"role": "user", "content": user_text})

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=conversations[user_id]
    )

    answer = response.choices[0].message.content
    conversations[user_id].append({"role": "assistant", "content": answer})
    return answer

Now each user has isolated memory.

Level 3: Sliding window memory

Full history grows forever (cost + latency). Keep recent messages only.

def trim_memory(history: list, max_messages: int = 20) -> list:
    system = [m for m in history if m["role"] == "system"]
    non_system = [m for m in history if m["role"] != "system"]

    trimmed = non_system[-max_messages:]
    return system[:1] + trimmed

conversations[user_id] = trim_memory(conversations[user_id], max_messages=12)

This is basically MessageWindowChatMemory, Python style.

Level 4: LangChain Memory

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain

llm = ChatOpenAI(model="gpt-4o")
memory = ConversationBufferWindowMemory(k=6)  # keep last 6 turns

conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

conversation.predict(input="My name is Sam")
print(conversation.predict(input="What's my name?"))

If you're already in LangChain, this reduces boilerplate fast.

Level 5: Persistent memory

For production, persist memory in Redis/Postgres/Mongo so restarts don't erase context.

Pattern is simple:

  1. load_messages(conversation_id)
  2. call model
  3. save_messages(conversation_id, updated_messages)

What to Remember

  1. Memory is your responsibility unless framework handles it
  2. Use per-conversation IDs for multi-user apps
  3. Always trim history (token cost matters)
  4. Persist memory if conversation continuity matters across restarts