Overview

A clean, powerful Python agent library with tools, MCP, streaming events, reasoning capture, and runtime policies.

9 min read
8 sections
Edit this page

Reasoning visibility

Stream live thinking blocks, tool calls, and outputs without a custom runtime.

Provider flexibility

Swap LLM providers with one config change while keeping your tools intact.

Production guardrails

Built-in retry policies, error recovery, hooks, and parallel execution.

MCP and tools

Mix Python tools, remote MCP servers, and connector-style integrations.

Start building powerful agents

The Shipit framework provides everything you need to build, deploy, and scale production-ready AI agents.

v1.0.8 — Five flagship features that genuinely beat LangChain

  1. Structured output with auto-retry — same-conversation validation retry (LangChain's OutputFixingParser runs a separate LLM call). Streaming partial JSON. Pydantic + JSON Schema, both work.
  2. Verifier network — a second cheap LLM vetoes hallucinated tool calls and detects stalling. Process supervision in one constructor argument.
  3. Episodic memory consolidation — distill conversations into durable facts; forgetting curve + core memory promotion. ChatGPT-style memory, principled, self-hostable.
  4. Time-travel replay — load any saved trace, fork from any event, edit the prompt, resume on a fresh agent. Replay.io for AI agents.
  5. ComputerUseAgent — drive a browser by showing screenshots to a vision-capable LLM. Anthropic native + plain-text fallback for any vision LLM. Self-hosted Devin/Operator.

1508 tests passing (+318 new), zero regressions. All five features available on both Agent and DeepAgent from a single from shipit_agent import ….

v1.0.7 — Agents for every role

12 new tools (GitHub, GitLab, SQL, Vision, PDF, Figma, Salesforce, Stripe, Sheets, Zendesk, LinkedIn read-only) + 7 persona specialists

  • LangSmith / OpenTelemetry trace exporters.

v1.0.6 — Bulletproof 24h Autopilot

Long-running, budget-gated, checkpointed runs that survive crashes. Fan-out parallelism, reflection critic, artifact collection.

SHIPIT Agent is a standalone Python agent library focused on a clean, production-grade runtime:

  • Bring your own LLM — or use any of nine built-in provider adapters (OpenAI, Anthropic, Bedrock, Gemini, Vertex, Groq, Together, Ollama, LiteLLM)
  • Tools, MCP, connectors — Python tools, remote MCP servers, 18 SaaS connectors (Gmail, Drive, Slack, GitHub, GitLab, Linear, Jira, Notion, Confluence, Salesforce, Stripe, Figma, Sheets, Zendesk, LinkedIn, PDF, SQL, custom HTTP APIs)
  • Five flagship power features (v1.0.8) — structured output with auto-retry, verifier network, memory consolidation, time-travel replay, computer-use agent
  • Skills marketplace — packaged or custom skills steer agent behavior and reusable workflows
  • Streaming events — including reasoning / thinking blocks as they happen
  • Tracing — file-based, OpenTelemetry, or LangSmith out of the box
  • Cost tracking + budgets — every LLM call accounted; hard caps per run
  • Autopilot — 24-hour budget-gated runs that survive crashes via JSON checkpoints
  • Multi-agent — Supervisor/Worker pattern, AgentTeam orchestration, sub-agent delegation
  • DeepAgent — power-user wrapper with verify / reflect / goal / RAG layers

Built for developers building real production agents — agents that beat ChatGPT and Operator on observability, cost control, self-hosting, and breadth of integration. Library-level API the whole way down.


Install

bash
pip install shipit-agent

With optional extras:

bash
pip install 'shipit-agent[openai]'         # OpenAI SDK
pip install 'shipit-agent[anthropic]'      # Anthropic SDK (native thinking blocks)
pip install 'shipit-agent[litellm]'        # LiteLLM (Bedrock, Gemini, Groq, Together, …)
pip install 'shipit-agent[playwright]'     # In-process browser for open_url and web_search
pip install 'shipit-agent[all]'            # Everything

30-second example

python
from shipit_agent import Agent
from shipit_agent.llms import OpenAIChatLLM

agent = Agent.with_builtins(llm=OpenAIChatLLM(model="gpt-4o-mini"))

for event in agent.stream("Search the web for today's Bitcoin price in USD."):
    print(event.type, event.message)

Works with any provider — swap OpenAIChatLLM for AnthropicChatLLM, BedrockChatLLM, VertexAIChatLLM, GeminiChatLLM, GroqChatLLM, TogetherChatLLM, OllamaChatLLM, or LiteLLMChatLLM, or use build_llm_from_env() to pick from .env. The agent code never changes. See LLM providers — use any model.

Emits events like:

bash
run_started           Agent run started
step_started          LLM completion started
reasoning_started     🧠 Model reasoning started
reasoning_completed   🧠 Model reasoning completed
tool_called           Tool called: web_search
tool_completed        Tool completed: web_search
run_completed         Agent run completed

Power features (v1.0.8)

Five flagship capabilities, all reachable from from shipit_agent import …, all available on both Agent AND DeepAgent.

Structured output with auto-retry

Pass a Pydantic model or JSON Schema to agent.run(output_schema=…). Get a typed result.parsed back. On parse failure, the runtime retries inside the same conversation — no separate fixing LLM. Streams partial JSON as tokens arrive.

Verifier network — process supervision

A second cheap LLM vetoes hallucinated tool calls before they fire, and detects when the agent is stalling. Both checks fail open (verifier failures never block the agent). One constructor arg.

Episodic memory consolidation

Distill conversations into 3-8 durable facts. Forgetting curve so old facts decay. Frequently-retrieved facts promote to a "core memory" set visible in the system prompt. ChatGPT-style memory, principled.

Time-travel replay

Load any saved trace, fork from any event, edit the prompt, resume on a fresh agent. Side-by-side diff of two runs. Replay.io for AI agents — open, library-level, no SaaS required.

Browser automation (ComputerUseAgent)

Drive a real browser by showing screenshots to a vision-capable LLM. Anthropic native + plain-text fallback for any vision LLM. Use it standalone or plug it into your main Agent as a browser_use tool.

+337 unit tests · 1527 passing · 0 regressions — every feature is reachable from a single from shipit_agent import … and works on both Agent and DeepAgent.


Why SHIPIT Agent

Live reasoning events

Extended thinking blocks from o1/o3/gpt-5/Claude/gpt-oss are automatically extracted and streamed as reasoning_started / reasoning_completed events. Your UI can show a live "Thinking" panel for free.

Truly incremental streaming

agent.stream() runs the agent on a background thread and yields events through a queue as they happen. Works in Jupyter, VS Code, WebSocket, SSE, and terminals.

Bulletproof Bedrock tool pairing

Every toolUse gets a paired toolResult. Planner output is injected as user context, not orphan tool-results. Hallucinated tool names get synthetic error results. Multi-iteration Bedrock loops just work.

Semantic tool discovery

tool_search lets the agent ask "which tool should I use for X?" and get a ranked shortlist. No more 28-tool context bloat, no more tool hallucinations.

Zero-friction provider switching

Edit one line in .envSHIPIT_LLM_PROVIDER=openai — and build_llm_from_env() does the rest. Seven providers supported out of the box.

Playwright-powered open_url

In-process Chromium fetches JS-rendered pages with a realistic UA, handles anti-bot 503s, and falls back to stdlib urllib if Playwright isn't installed. No external scraper services.

Parallel tool execution

When the LLM returns multiple tool calls, run them concurrently with parallel_tool_execution=True. Results stay in order. Typically 2-3x faster for multi-tool turns.

Hooks & middleware

AgentHooks with @on_before_llm, @on_after_llm, @on_before_tool, @on_after_tool for cost tracking, rate limiting, content filtering, and guardrails. No subclassing.

Async runtime

AsyncAgentRuntime with async run() and async stream() for FastAPI, Starlette, and modern async Python. Same features as the sync runtime.

Graceful error recovery

Tool failures produce error messages instead of crashing the run. The LLM sees the error and can try a different approach. Safer retry defaults prevent retrying on bugs.


Next steps


Try it now — runnable examples

The repo ships with 7 numbered, copy-pasteable examples covering every major feature. Pick one and run it in 30 seconds.

#WhatRun
1Hello, agent. The shortest possible runnable examplepython examples/01_hello_agent.py
2Live streaming with colored reasoning eventspython examples/02_streaming_with_reasoning.py
3Same agent, 5 different LLM providers back-to-backpython examples/03_provider_swap.py
4End-to-end research workflow with web search + URL fetchingpython examples/04_research_agent.py "your question"
5Custom tools — function-style and class-stylepython examples/05_custom_tool.py
6Persistent chat session with file-backed memorypython examples/06_chat_session.py
7Semantic tool discovery with tool_searchpython examples/07_tool_search.py

See the full examples README →


Provider compatibility matrix

ProviderReasoning blocksTool callingStreamingBedrock pairingBuilt-in tools
OpenAI (o1, o3, o4, gpt-5)✅ Nativen/a
OpenAI (gpt-4o, gpt-4o-mini)n/a
Anthropic (claude-opus-4, claude-3.7)✅ Native (with thinking_budget_tokens)n/a
AWS Bedrock (gpt-oss-120b)✅ Via LiteLLM✅ Bulletproof
AWS Bedrock (anthropic.claude-*)✅ Via LiteLLM✅ Bulletproof
Google Gemini (gemini-1.5-pro)n/a
Google Vertex AIn/a
Groq (llama-3.3-70b)n/a
Together AIn/a
Ollama (local)n/a
DeepSeek R1 (via LiteLLM proxy)✅ Nativen/a
LiteLLM Proxy (self-hosted gateway)✅ Pass-throughn/a

Tip: if you want a "Thinking" panel UI without paying for o1/Claude, AWS Bedrock's openai.gpt-oss-120b-1:0 is the cheapest reasoning-capable model in the matrix and ships with Agent.with_builtins(llm=BedrockChatLLM()) out of the box.


What you get vs. what you don't

✅ shipit-agent does❌ shipit-agent does NOT do
Run agents with tools, MCP, memory, sessionsTrain models or fine-tune
Stream events incrementally as they happenProvide a hosted control plane
Extract reasoning blocks from any providerReplace LangChain / LangGraph / CrewAI wholesale
Guarantee Bedrock tool-pairing correctnessManage your cloud infrastructure
Support 9 LLM providers via one APILock you into a specific vendor
Ship with 28+ built-in toolsForce you to use any of them
Stay out of your way (small, focused runtime)Hide the agent loop behind abstractions

This is a library, not a framework. The runtime is small enough to read in one sitting (shipit_agent/runtime.py is under 400 lines). Bring your own LLM, tools, and storage; the runtime composes them and gets out of the way.