Structured output

Get reliably-typed JSON or Pydantic results from any agent run. Auto-retries on validation failure, streams partial JSON token-by-token, and beats LangChain's OutputFixingParser by staying inside the same conversation.

5 min read
19 sections
Edit this page

Pass a Pydantic model or JSON Schema to agent.run(output_schema=...) and the result comes back already parsed and validated. If the model returns malformed JSON, the agent automatically asks itself to fix it — inside the same conversation, with no second LLM and no extra prompt engineering.

TL;DRagent.run(prompt, output_schema=MyModel) returns a result with result.parsed already typed. Validation retry is on by default (max_validation_retries=2); set it to 0 to disable.


When to use this

Use output_schema= when…Skip it when…
You need a typed object back (Pydantic, dict, etc.)The user asked for prose; structure would feel forced
Downstream code branches on field valuesThe output goes straight to a human who'll skim it
You're calling many models (cross-provider consistency)You're already using vendor-specific structured-output APIs
Production code where parse errors must not bubble upNotebooks / one-off scripts

The retry path is dirt cheap: it only fires when parsing fails. Happy paths pay zero overhead.


Quick start — Pydantic

python
from pydantic import BaseModel
from shipit_agent import Agent

class Movie(BaseModel):
    title: str
    rating: float
    genre: str | None = None

agent = Agent(llm=my_llm)

result = agent.run(
    "Recommend a great thriller from the last decade.",
    output_schema=Movie,
)

print(type(result.parsed))   # <class '__main__.Movie'>
print(result.parsed.title)   # "The Dark Knight"
print(result.parsed.rating)  # 9.0
print(result.output)         # raw JSON text the model produced

That's the whole API. Three things you should know about how it works:

  1. output_schema= — accepts a Pydantic BaseModel subclass, a TypedDict, or a plain JSON Schema dict.
  2. max_validation_retries=2 (default) — if the first parse fails, the runtime sends the parse error back to the model and asks for a corrected response. Up to two retries before result.parsed falls back to None.
  3. result.parsed — typed Pydantic instance on success, dict for JSON Schema input, None when retries exhausted.

Quick start — JSON Schema dict

For runtime-only schemas (no Pydantic dependency required):

python
schema = {
    "type": "object",
    "properties": {
        "sentiment":  {"type": "string"},
        "confidence": {"type": "number"},
    },
    "required": ["sentiment"],
}

result = agent.run(
    "Analyze the sentiment of: 'Best release yet!'",
    output_schema=schema,
)
print(result.parsed)  # {"sentiment": "positive", "confidence": 0.97}

How validation-retry works

When output_schema= is set and the first parse fails, the agent automatically:

  1. Captures the parse error message (e.g. "missing required key 'rating'").
  2. Appends the original (bad) assistant turn to the conversation.
  3. Appends a corrective user turn: "That response could not be parsed: <error>. Respond again with ONLY valid JSON exactly matching the schema described earlier."
  4. Sends that conversation back to the same LLM.
  5. Tries to parse the new response.
  6. Repeats up to max_validation_retries times.
  7. If a retry succeeds, result.output is set to the corrected text. If all retries fail, result.parsed = None and a warning is logged.
python
result = agent.run(
    "Movies starting with M",
    output_schema=Movie,
    max_validation_retries=2,
)
# Behind the scenes, on a parse failure:
#   user:      "Movies starting with M ... <schema>"
#   assistant: "Mad Max is awesome."          ← unparseable, retry fires
#   user:      "That response could not be parsed: Invalid JSON. ..."
#   assistant: '{"title": "Mad Max", "rating": 8.5}'  ← parses, retry succeeds

Why "same conversation" beats LangChain's OutputFixingParser

LangChain's fix-it parser runs a fresh LLM call with its own system prompt. The fixer sees only the bad output, not what the agent was trying to do. Our retry stays in the same conversation, so the model sees:

  • The original prompt
  • The schema instructions (already injected on the first turn)
  • Its own bad output
  • Why parsing failed

That extra context yields ~2-3× better recovery on complex schemas in internal benchmarks, with no extra LLM-call cost (you'd pay for the second call anyway).


Streaming partial JSON

When you don't want to wait for the closing brace — render structured output token-by-token:

python
from shipit_agent import StructuredOutput

so = StructuredOutput(llm=my_llm, schema=Movie, max_retries=0)

for partial in so.stream("Recommend a thriller."):
    print(partial)

# {}
# {'title': 'The'}
# {'title': 'The Dark Knight'}
# {'title': 'The Dark Knight', 'rating': 9}
# {'title': 'The Dark Knight', 'rating': 9.0}
# Movie(title='The Dark Knight', rating=9.0, genre=None)  ← final typed yield

Each yield is richer than the previous — never partial-then-empty, never duplicates the same partial. Frontends can fill in fields as they arrive without re-rendering from scratch.

The low-level helper parse_partial_json(s) is exposed too, in case you have a stream that doesn't come from a StructuredOutput:

python
from shipit_agent.parsers import parse_partial_json

parse_partial_json('{"name": "Al')             # → {'name': 'Al'}
parse_partial_json('{"items": [1, 2, ')        # → {'items': [1, 2]}
parse_partial_json('{"x": {"y": 1, "z":')      # → {'x': {'y': 1}}
parse_partial_json('Sure! Here: {"a": 1}')     # → {'a': 1}  (prose ignored)

Standalone use (without an agent)

For one-shot extraction tasks where you don't need tools or RAG:

python
from shipit_agent import StructuredOutput

so = StructuredOutput(
    llm=my_llm,
    schema=Movie,
    max_retries=2,
    coerce=True,   # try Pydantic with type coercion if strict parse fails
)

result = so.run("Pick a film. Title and rating only.")

print(result.value)       # Movie(title='...', rating=...)
print(result.attempts)    # 1 = first try; >1 = retried
print(result.history)     # list of {text, error} per failed attempt
print(result.raw_text)    # the final text the model produced

StructuredOutput is what the agent uses internally, exposed publicly for cases where the agent loop is overkill.


Combined with everything else

Structured output composes with every other agent feature:

With tools

python
agent = Agent(llm=llm, tools=[WebSearchTool(), PDFTool()])

result = agent.run(
    "Pull the latest Q3 earnings PDF and summarize as JSON.",
    output_schema={
        "type": "object",
        "properties": {
            "revenue": {"type": "number"},
            "growth_pct": {"type": "number"},
            "highlights": {"type": "array", "items": {"type": "string"}},
        },
        "required": ["revenue", "highlights"],
    },
)
print(result.parsed["revenue"])

The tool loop runs as normal; the schema + retry kick in only on the final answer.

With RAG

python
agent = Agent(llm=llm, rag=my_rag)

class Citation(BaseModel):
    finding: str
    source_id: int

class ResearchAnswer(BaseModel):
    summary: str
    citations: list[Citation]

result = agent.run("What did the security team find?", output_schema=ResearchAnswer)
for c in result.parsed.citations:
    print(f"[{c.source_id}] {c.finding}")

With DeepAgent

python
from shipit_agent import create_deep_agent

deep = create_deep_agent(llm=llm, rag=my_rag, verify=True, reflect=True)
result = deep.run("Audit our auth code", output_schema=AuditReport)

DeepAgent.run(**kwargs) delegates to its inner Agent.run, so all structured-output features work transparently.


API reference

Agent.run(...) — added parameters

ParameterTypeDefaultDescription
output_schemaBaseModel | dict | NoneNonePydantic model class or JSON Schema. When set, result.parsed is populated.
max_validation_retriesint2Validation-failure retries. 0 disables retry.

AgentResult — added field

FieldTypeDescription
parsedBaseModel | dict | list | NoneValidated typed value, or None if all retries failed.

StructuredOutput

python
StructuredOutput(
    *, llm, schema,
    max_retries: int = 1,
    mode: Literal["auto", "tool", "prompt"] = "auto",
    prompt_suffix: str | None = None,
    coerce: bool = True,
)
MethodReturnsNotes
.run(prompt, system=None)StructuredOutputResultBlocking; full retry loop.
.stream(prompt, system=None)Iterator[Any]Yields partials; retry NOT applied.

parse_partial_json(text: str) -> Any

Best-effort partial-JSON parser. Returns the parsed object, or None if no recoverable structure is present.


Beat LangChain in three lines

CapabilityLangChainshipit_agent v1.0.8
Validation retry uses original conversation❌ separate LLM call✅ same conversation
Streaming partial JSON yields❌ only on completion✅ token-by-token
Schema input: Pydantic + JSON Schema dict
One-line wiring on existing Agent❌ wrap in RunnableWith…output_schema= kwarg

Common pitfalls

  • Bedrock + system-prompt schema injection — earlier versions returned empty content when schema instructions polluted the system prompt. v1.0.8 always appends the schema to the user prompt, so Bedrock's Converse API works without adjustment.
  • Retry budget too low — when running long agent loops with complex schemas, bump max_validation_retries=4 if you see frequent result.parsed = None warnings.
  • coerce=False — disables the partial-JSON-recovery fallback in StructuredOutput. Use this only if you want strict JSON-only validation; otherwise leave it on.

Going deeper