Structured Output in Python
The problem
LLMs return text by default. Your app usually needs typed data.
So instead of parsing random prose with regex, ask for structured output directly.
Level 1: JSON Mode (simple)
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": "Return valid JSON only."},
{"role": "user", "content": "Extract name and city from: 'Rita lives in Seattle'"}
]
)
print(response.choices[0].message.content)
# {"name":"Rita","city":"Seattle"}
Easy start, but still returns a JSON string. You parse it yourself.
Level 2: Parse into Pydantic model
from pydantic import BaseModel
import json
class Person(BaseModel):
name: str
city: str
raw = response.choices[0].message.content
data = json.loads(raw)
person = Person.model_validate(data)
print(person.name, person.city)
Now your code is typed and safer.
Level 3: Strict Schema via response_format
Use explicit schema when format must be exact.
schema = {
"type": "json_schema",
"json_schema": {
"name": "person_extraction",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"city": {"type": "string"}
},
"required": ["name", "city"],
"additionalProperties": False
},
"strict": True
}
}
response = client.chat.completions.create(
model="gpt-4o",
response_format=schema,
messages=[
{"role": "system", "content": "Extract person data."},
{"role": "user", "content": "Rita lives in Seattle"}
]
)
This is closer to contract-first APIs.
Level 4: LangChain Output Parser
from langchain_core.output_parsers import PydanticOutputParser
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
parser = PydanticOutputParser(pydantic_object=Person)
prompt = PromptTemplate(
template="Extract person from text.\n{format_instructions}\nText: {text}",
input_variables=["text"],
partial_variables={"format_instructions": parser.get_format_instructions()}
)
llm = ChatOpenAI(model="gpt-4o")
chain = prompt | llm | parser
person = chain.invoke({"text": "Rita lives in Seattle"})
print(person)
LangChain handles prompt instructions + parsing pipeline cleanly.
When to use what?
- Simple extraction: JSON mode
- Business-critical schema: strict JSON schema
- LangChain pipeline app: PydanticOutputParser
What to Remember
- Structured output beats regex parsing almost always
- Pydantic gives runtime validation and clean typed objects
- Use strict schemas when downstream systems are sensitive
- Fail fast on invalid output instead of silently guessing