aregmi.net
Resume

Getting JSON and typed objects instead of free-form text using OpenAI response_format and Pydantic

structured-output pydantic json openai langchain python

Structured Output in Python

The problem

LLMs return text by default. Your app usually needs typed data.

So instead of parsing random prose with regex, ask for structured output directly.

Level 1: JSON Mode (simple)

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": "Return valid JSON only."},
        {"role": "user", "content": "Extract name and city from: 'Rita lives in Seattle'"}
    ]
)

print(response.choices[0].message.content)
# {"name":"Rita","city":"Seattle"}

Easy start, but still returns a JSON string. You parse it yourself.

Level 2: Parse into Pydantic model

from pydantic import BaseModel
import json

class Person(BaseModel):
    name: str
    city: str

raw = response.choices[0].message.content
data = json.loads(raw)
person = Person.model_validate(data)

print(person.name, person.city)

Now your code is typed and safer.

Level 3: Strict Schema via response_format

Use explicit schema when format must be exact.

schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "person_extraction",
        "schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "city": {"type": "string"}
            },
            "required": ["name", "city"],
            "additionalProperties": False
        },
        "strict": True
    }
}

response = client.chat.completions.create(
    model="gpt-4o",
    response_format=schema,
    messages=[
        {"role": "system", "content": "Extract person data."},
        {"role": "user", "content": "Rita lives in Seattle"}
    ]
)

This is closer to contract-first APIs.

Level 4: LangChain Output Parser

from langchain_core.output_parsers import PydanticOutputParser
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate

parser = PydanticOutputParser(pydantic_object=Person)

prompt = PromptTemplate(
    template="Extract person from text.\n{format_instructions}\nText: {text}",
    input_variables=["text"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

llm = ChatOpenAI(model="gpt-4o")
chain = prompt | llm | parser

person = chain.invoke({"text": "Rita lives in Seattle"})
print(person)

LangChain handles prompt instructions + parsing pipeline cleanly.

When to use what?

  1. Simple extraction: JSON mode
  2. Business-critical schema: strict JSON schema
  3. LangChain pipeline app: PydanticOutputParser

What to Remember

  1. Structured output beats regex parsing almost always
  2. Pydantic gives runtime validation and clean typed objects
  3. Use strict schemas when downstream systems are sensitive
  4. Fail fast on invalid output instead of silently guessing