Overview

Name: SHIPIT Agent
Author: SHIPIT

A clean, powerful Python agent library with tools, MCP, streaming events, reasoning capture, and runtime policies.

9 min read

8 sections

Edit this page

Reasoning visibility

Stream live thinking blocks, tool calls, and outputs without a custom runtime.

Provider flexibility

Swap LLM providers with one config change while keeping your tools intact.

Production guardrails

Built-in retry policies, error recovery, hooks, and parallel execution.

MCP and tools

Mix Python tools, remote MCP servers, and connector-style integrations.

Start building powerful agents

The Shipit framework provides everything you need to build, deploy, and scale production-ready AI agents.

Quickstart Architecture

v1.0.8 — Five flagship features that genuinely beat LangChain

Structured output with auto-retry — same-conversation validation retry (LangChain's OutputFixingParser runs a separate LLM call). Streaming partial JSON. Pydantic + JSON Schema, both work.
Verifier network — a second cheap LLM vetoes hallucinated tool calls and detects stalling. Process supervision in one constructor argument.
Episodic memory consolidation — distill conversations into durable facts; forgetting curve + core memory promotion. ChatGPT-style memory, principled, self-hostable.
Time-travel replay — load any saved trace, fork from any event, edit the prompt, resume on a fresh agent. Replay.io for AI agents.
ComputerUseAgent — drive a browser by showing screenshots to a vision-capable LLM. Anthropic native + plain-text fallback for any vision LLM. Self-hosted Devin/Operator.

1508 tests passing (+318 new), zero regressions. All five features available on both Agent and DeepAgent from a single from shipit_agent import ….

v1.0.7 — Agents for every role

12 new tools (GitHub, GitLab, SQL, Vision, PDF, Figma, Salesforce, Stripe, Sheets, Zendesk, LinkedIn read-only) + 7 persona specialists

LangSmith / OpenTelemetry trace exporters.

v1.0.6 — Bulletproof 24h Autopilot

Long-running, budget-gated, checkpointed runs that survive crashes. Fan-out parallelism, reflection critic, artifact collection.

SHIPIT Agent is a standalone Python agent library focused on a clean, production-grade runtime:

Bring your own LLM — or use any of nine built-in provider adapters (OpenAI, Anthropic, Bedrock, Gemini, Vertex, Groq, Together, Ollama, LiteLLM)
Tools, MCP, connectors — Python tools, remote MCP servers, 18 SaaS connectors (Gmail, Drive, Slack, GitHub, GitLab, Linear, Jira, Notion, Confluence, Salesforce, Stripe, Figma, Sheets, Zendesk, LinkedIn, PDF, SQL, custom HTTP APIs)
Five flagship power features (v1.0.8) — structured output with auto-retry, verifier network, memory consolidation, time-travel replay, computer-use agent
Skills marketplace — packaged or custom skills steer agent behavior and reusable workflows
Streaming events — including reasoning / thinking blocks as they happen
Tracing — file-based, OpenTelemetry, or LangSmith out of the box
Cost tracking + budgets — every LLM call accounted; hard caps per run
Autopilot — 24-hour budget-gated runs that survive crashes via JSON checkpoints
Multi-agent — Supervisor/Worker pattern, AgentTeam orchestration, sub-agent delegation
DeepAgent — power-user wrapper with verify / reflect / goal / RAG layers

Built for developers building real production agents — agents that beat ChatGPT and Operator on observability, cost control, self-hosting, and breadth of integration. Library-level API the whole way down.

Install

bash

pip install shipit-agent

With optional extras:

bash

pip install 'shipit-agent[openai]'         # OpenAI SDK
pip install 'shipit-agent[anthropic]'      # Anthropic SDK (native thinking blocks)
pip install 'shipit-agent[litellm]'        # LiteLLM (Bedrock, Gemini, Groq, Together, …)
pip install 'shipit-agent[playwright]'     # In-process browser for open_url and web_search
pip install 'shipit-agent[all]'            # Everything

30-second example

python

from shipit_agent import Agent
from shipit_agent.llms import OpenAIChatLLM

agent = Agent.with_builtins(llm=OpenAIChatLLM(model="gpt-4o-mini"))

for event in agent.stream("Search the web for today's Bitcoin price in USD."):
    print(event.type, event.message)

Works with any provider — swap OpenAIChatLLM for AnthropicChatLLM, BedrockChatLLM, VertexAIChatLLM, GeminiChatLLM, GroqChatLLM, TogetherChatLLM, OllamaChatLLM, or LiteLLMChatLLM, or use build_llm_from_env() to pick from .env. The agent code never changes. See LLM providers — use any model.

Emits events like:

bash

run_started           Agent run started
step_started          LLM completion started
reasoning_started     🧠 Model reasoning started
reasoning_completed   🧠 Model reasoning completed
tool_called           Tool called: web_search
tool_completed        Tool completed: web_search
run_completed         Agent run completed

Power features (v1.0.8)

Five flagship capabilities, all reachable from from shipit_agent import …, all available on both Agent AND DeepAgent.

Structured output with auto-retry

Pass a Pydantic model or JSON Schema to agent.run(output_schema=…). Get a typed result.parsed back. On parse failure, the runtime retries inside the same conversation — no separate fixing LLM. Streams partial JSON as tokens arrive.

Structured output

Verifier network — process supervision

A second cheap LLM vetoes hallucinated tool calls before they fire, and detects when the agent is stalling. Both checks fail open (verifier failures never block the agent). One constructor arg.

Verifier network

Episodic memory consolidation

Distill conversations into 3-8 durable facts. Forgetting curve so old facts decay. Frequently-retrieved facts promote to a "core memory" set visible in the system prompt. ChatGPT-style memory, principled.

Memory consolidation

Time-travel replay

Load any saved trace, fork from any event, edit the prompt, resume on a fresh agent. Side-by-side diff of two runs. Replay.io for AI agents — open, library-level, no SaaS required.

Time-travel replay

Browser automation (ComputerUseAgent)

Drive a real browser by showing screenshots to a vision-capable LLM. Anthropic native + plain-text fallback for any vision LLM. Use it standalone or plug it into your main Agent as a browser_use tool.

Browser automation

+337 unit tests · 1527 passing · 0 regressions — every feature is reachable from a single from shipit_agent import … and works on both Agent and DeepAgent.

Why SHIPIT Agent

Live reasoning events

Extended thinking blocks from o1/o3/gpt-5/Claude/gpt-oss are automatically extracted and streamed as reasoning_started / reasoning_completed events. Your UI can show a live "Thinking" panel for free.

Reasoning guide

Truly incremental streaming

agent.stream() runs the agent on a background thread and yields events through a queue as they happen. Works in Jupyter, VS Code, WebSocket, SSE, and terminals.

Streaming guide

Bulletproof Bedrock tool pairing

Every toolUse gets a paired toolResult. Planner output is injected as user context, not orphan tool-results. Hallucinated tool names get synthetic error results. Multi-iteration Bedrock loops just work.

Architecture

Semantic tool discovery

tool_search lets the agent ask "which tool should I use for X?" and get a ranked shortlist. No more 28-tool context bloat, no more tool hallucinations.

Tool search guide

Zero-friction provider switching

Edit one line in .env — SHIPIT_LLM_PROVIDER=openai — and build_llm_from_env() does the rest. Seven providers supported out of the box.

Environment setup

Playwright-powered `open_url`

In-process Chromium fetches JS-rendered pages with a realistic UA, handles anti-bot 503s, and falls back to stdlib urllib if Playwright isn't installed. No external scraper services.

Tool catalog

Parallel tool execution

When the LLM returns multiple tool calls, run them concurrently with parallel_tool_execution=True. Results stay in order. Typically 2-3x faster for multi-tool turns.

Parallel execution guide

Hooks & middleware

AgentHooks with @on_before_llm, @on_after_llm, @on_before_tool, @on_after_tool for cost tracking, rate limiting, content filtering, and guardrails. No subclassing.

Hooks guide

Async runtime

AsyncAgentRuntime with async run() and async stream() for FastAPI, Starlette, and modern async Python. Same features as the sync runtime.

Async guide

Graceful error recovery

Tool failures produce error messages instead of crashing the run. The LLM sees the error and can try a different approach. Safer retry defaults prevent retrying on bugs.

Error recovery guide

Next steps

Install and run the quick start — get an agent running in five minutes
Explore streaming events — understand the 14 event types and what they carry
Reasoning and thinking steps — render a live "Thinking" panel in your UI
Create a custom tool — build a new tool from scratch
Use skills — packaged skills, custom skills, Agent, and DeepAgent workflows
MCP integration — attach remote MCP servers to extend capabilities
Parallel tool execution — speed up multi-tool turns
Hooks & middleware — add cost tracking, logging, and guardrails
Async runtime — use with FastAPI and async Python
Context window management — track tokens and manage context limits
Error recovery — graceful failure handling and retry policies

Try it now — runnable examples

The repo ships with 7 numbered, copy-pasteable examples covering every major feature. Pick one and run it in 30 seconds.

#	What	Run
1	Hello, agent. The shortest possible runnable example	`python examples/01_hello_agent.py`
2	Live streaming with colored reasoning events	`python examples/02_streaming_with_reasoning.py`
3	Same agent, 5 different LLM providers back-to-back	`python examples/03_provider_swap.py`
4	End-to-end research workflow with web search + URL fetching	`python examples/04_research_agent.py "your question"`
5	Custom tools — function-style and class-style	`python examples/05_custom_tool.py`
6	Persistent chat session with file-backed memory	`python examples/06_chat_session.py`
7	Semantic tool discovery with `tool_search`	`python examples/07_tool_search.py`

See the full examples README →

Provider compatibility matrix

Provider	Reasoning blocks	Tool calling	Streaming	Bedrock pairing	Built-in tools
OpenAI (`o1`, `o3`, `o4`, `gpt-5`)	✅ Native	✅	✅	n/a	✅
OpenAI (`gpt-4o`, `gpt-4o-mini`)	❌	✅	✅	n/a	✅
Anthropic (`claude-opus-4`, `claude-3.7`)	✅ Native (with `thinking_budget_tokens`)	✅	✅	n/a	✅
AWS Bedrock (`gpt-oss-120b`)	✅ Via LiteLLM	✅	✅	✅ Bulletproof	✅
AWS Bedrock (`anthropic.claude-*`)	✅ Via LiteLLM	✅	✅	✅ Bulletproof	✅
Google Gemini (`gemini-1.5-pro`)	❌	✅	✅	n/a	✅
Google Vertex AI	❌	✅	✅	n/a	✅
Groq (`llama-3.3-70b`)	❌	✅	✅	n/a	✅
Together AI	❌	✅	✅	n/a	✅
Ollama (local)	❌	✅	✅	n/a	✅
DeepSeek R1 (via LiteLLM proxy)	✅ Native	✅	✅	n/a	✅
LiteLLM Proxy (self-hosted gateway)	✅ Pass-through	✅	✅	n/a	✅

Tip: if you want a "Thinking" panel UI without paying for o1/Claude, AWS Bedrock's openai.gpt-oss-120b-1:0 is the cheapest reasoning-capable model in the matrix and ships with Agent.with_builtins(llm=BedrockChatLLM()) out of the box.

What you get vs. what you don't

✅ shipit-agent does	❌ shipit-agent does NOT do
Run agents with tools, MCP, memory, sessions	Train models or fine-tune
Stream events incrementally as they happen	Provide a hosted control plane
Extract reasoning blocks from any provider	Replace LangChain / LangGraph / CrewAI wholesale
Guarantee Bedrock tool-pairing correctness	Manage your cloud infrastructure
Support 9 LLM providers via one API	Lock you into a specific vendor
Ship with 28+ built-in tools	Force you to use any of them
Stay out of your way (small, focused runtime)	Hide the agent loop behind abstractions

This is a library, not a framework. The runtime is small enough to read in one sitting (shipit_agent/runtime.py is under 400 lines). Bring your own LLM, tools, and storage; the runtime composes them and gets out of the way.

Reasoning visibility

Provider flexibility

Production guardrails

MCP and tools

Start building powerful agents

Install

30-second example

Power features (v1.0.8)

Structured output with auto-retry

Verifier network — process supervision

Episodic memory consolidation

Time-travel replay

Browser automation (ComputerUseAgent)

Why SHIPIT Agent

Live reasoning events

Truly incremental streaming

Bulletproof Bedrock tool pairing

Semantic tool discovery

Zero-friction provider switching

Playwright-powered open_url

Parallel tool execution

Hooks & middleware

Async runtime

Graceful error recovery

Next steps

Try it now — runnable examples

Provider compatibility matrix

What you get vs. what you don't

Playwright-powered `open_url`