Streaming events

3 min read

9 sections

agent.stream() yields AgentEvent objects as the runtime executes. Under the hood, the agent runs on a background thread and pushes events through a thread-safe queue, so every event reaches your loop the instant it's emitted — no buffering, no batched delivery.

Basic usage

python

from shipit_agent import Agent
from examples.run_multi_tool_agent import build_llm_from_env

agent = Agent.with_builtins(llm=build_llm_from_env('openai'))

for event in agent.stream("Find today's Bitcoin price in USD."):
    print(f"{event.type:22s} {event.message}")

Event reference

Event type	When it fires	Key payload fields
`run_started`	Very first event of a run, once per `stream()`/`run()` call.	`prompt`
`mcp_attached`	Once per attached MCP server, right after `run_started`.	`server`
`planning_started`	Router policy decided the prompt is complex enough to invoke `plan_task`. Fires before the first LLM call.	`prompt`
`planning_completed`	Planner returned. Output is injected into history as a `user`-role context message (Bedrock tool-pairing safe).	`output`
`step_started`	Each iteration of the tool loop, right before calling the LLM.	`iteration`, `tool_count`
`reasoning_started`	🧠 LLM response contained a thinking/reasoning block.	`iteration`
`reasoning_completed`	Immediately after `reasoning_started`, carrying the full reasoning text.	`iteration`, `content`
`tool_called`	Model decided to call a tool. Fires before execution.	`tool`, `call_id`, `iteration`, `arguments`
`tool_completed`	Tool finished successfully.	`tool`, `call_id`, `iteration`, `output`, `duration_ms`
`tool_retry`	Transient tool failure, retry scheduled by `RetryPolicy`.	`tool`, `call_id`, `iteration`, `attempt`, `error`
`tool_failed`	Non-retryable tool error, or model hallucinated an unregistered tool name (synthetic error result still appended for pairing balance).	`tool`, `call_id`, `iteration`, `error`, `duration_ms`
`llm_retry`	Transient LLM provider error, retry scheduled.	`attempt`, `error`
`interactive_request`	A tool returned `metadata.interactive=True` (e.g. `ask_user`, human review). UI can pause and collect input.	`kind`, `payload`
`context_compacted`	Older turns were condensed to stay inside the context window (v1.0.15).	`before`, `after`, `iteration`
`run_completed`	Final event. Fires once the loop exits or hits the iteration cap.	`output`, `content`, `format`

Event structure

python

@dataclass
class AgentEvent:
    type: str                # e.g. "tool_called"
    message: str             # human-readable, e.g. "Tool called: web_search"
    payload: dict[str, Any]  # event-specific fields
    timestamp: float         # unix time the event fired (v1.0.15)

Serialize with event.to_dict() for WebSocket/SSE transport.

Live-updatable tool cards (v1.0.15): the four tool events share a call_id, so a UI can render one card on tool_called and update it in place — running → ✓ with duration_ms — when the matching tool_completed or tool_failed arrives.

Clean rendering — the Claude-Code experience

The one-call version: agent.run_live(prompt) streams tokens as they're generated, renders tool calls as ⚙ cards, and closes with a summary footer — then returns the final answer text:

python

answer = agent.run_live("Close Q2 and hand me the workbook.")

# ⚙ build_document(kind="xlsx", title="Q2 Close", …) …
# ⚙ build_document ✓ 228ms
#   └ Created XLSX 'Q2 Close' → q2_close.xlsx (5,108 bytes)
# The workbook is ready — net income formula included.   ← token-by-token
# ✔ done · 1 tool call

Token streaming is real in all three native adapters — OpenAIChatLLM (including Gemma 4 on Bedrock mantle, Groq, and any OpenAI-compatible endpoint), AnthropicChatLLM, and the LiteLLM adapter. Adapters that can't stream still work: the answer prints once at the end.

style="rich" (automatic on TTYs) draws Claude-Code-style ⏺/⎿ cards with ANSI colors — tool name in cyan, ✓ green, ✗ red, durations dimmed:

bash

⏺ build_document(kind="xlsx", title="Parity Check", …)
  ⎿ Created XLSX 'Parity Check' → parity_check.xlsx (5,016 bytes) ✓ 294ms
The workbook has been created with sheet S1.
✔ done · 1 tool call

Need to stop a run mid-flight? agent.cancel() (thread-safe, from any thread) halts at the next checkpoint, emits run_cancelled, and returns normally with metadata["cancelled"].

For custom loops, StreamRenderer does the interleaving (tokens inline, cards on their own lines, newline management) while you keep control of the events:

python

from shipit_agent import StreamRenderer, format_activity, format_event_line

renderer = StreamRenderer()                 # or StreamRenderer(file=buf)
for event in agent.stream("Close Q2 and hand me the workbook."):
    renderer.feed(event)                    # render + your own handling here
renderer.close()

And for after-the-fact rendering, format_event_line(event) gives one clean line per user-facing event, format_activity(result) renders a finished run as one trace, and result.summary() returns the metrics dict (duration, iterations, per-tool ms).

Typical event trace

A Bedrock gpt-oss-120b run with two tool calls:

bash

1.  run_started          Agent run started
2.  step_started         iteration=1, tool_count=28
3.  reasoning_started    🧠 iteration=1
4.  reasoning_completed  🧠 "The user wants two BTC price sources. I'll start with web_search..."
5.  tool_called          Tool called: web_search
6.  tool_completed       Tool completed: web_search
7.  step_started         iteration=2
8.  reasoning_completed  🧠 "Now I'll open both URLs to confirm..."
9.  tool_called          Tool called: open_url
10. tool_completed       Tool completed: open_url
11. tool_called          Tool called: open_url
12. tool_completed       Tool completed: open_url
13. step_started         iteration=3
14. reasoning_completed  🧠 "Both sources agree within $40..."
15. run_completed        "**Bitcoin Price — 2026-04-09** ..."

Live UI updates in Jupyter

python

from IPython.display import Markdown, clear_output, display

lines = []
for event in agent.stream(prompt):
    lines.append(f"{event.type} — {event.message}")
    clear_output(wait=True)
    display(Markdown("## Live Stream\n\n" + "\n".join(lines)))

Uses clear_output(wait=True) + display(...) for reliable incremental rendering in Jupyter, VS Code, and JupyterLab.

WebSocket/SSE packet transports

python

session = agent.chat_session(session_id='demo')

for packet in session.stream_packets("Research Bitcoin", transport='websocket'):
    print(packet)   # serialized AgentEvent dict

for packet in session.stream_packets("Research Bitcoin", transport='sse'):
    print(packet)   # SSE-formatted string

Both transports yield packets incrementally — no buffering.

Error handling

Errors raised during the run (LLM provider exceptions, non-retryable tool failures) are captured on the background worker thread and re-raised on the consumer thread when the stream terminates. Nothing gets silently swallowed.

python

try:
    for event in agent.stream(prompt):
        ...
except RuntimeError as exc:
    print("Agent run failed:", exc)

Reasoning guide — how reasoning events are extracted from providers
Event types reference — full payload schemas

Basic usage

Event reference

Event structure

Clean rendering — the Claude-Code experience

Typical event trace

Live UI updates in Jupyter

WebSocket/SSE packet transports

Error handling

Related