Overview
A clean, powerful Python agent library with tools, MCP, streaming events, reasoning capture, and runtime policies.
Reasoning visibility
Stream live thinking blocks, tool calls, and outputs without a custom runtime.
Provider flexibility
Swap LLM providers with one config change while keeping your tools intact.
Production guardrails
Built-in retry policies, error recovery, hooks, and parallel execution.
MCP and tools
Mix Python tools, remote MCP servers, and connector-style integrations.
Start building powerful agents
The Shipit framework provides everything you need to build, deploy, and scale production-ready AI agents.
v1.0.3 — Super RAG, DeepAgent, live chat REPL
New in 1.0.3: Super RAG subsystem (hybrid search, auto-cited sources), DeepAgent factory with verify / reflect / goal / sub-agents, shipit chat live multi-agent terminal REPL, and the Agent memory cookbook. 521 unit tests + 19 real-Bedrock end-to-end smoke tests, all passing. See the changelog.
SHIPIT Agent is a standalone Python agent library focused on a clean runtime:
- bring your own LLM — or use any of seven built-in provider adapters
- attach Python tools, remote MCP servers, or connector-style third-party tools (Gmail, Drive, Slack, Linear, Notion, Jira, Confluence)
- attach packaged or custom skills to steer agent behavior and reusable workflows
- iterate tool-using agents with configurable retry and router policies
- stream structured events (including reasoning / thinking blocks) as they happen
- inspect every step: reasoning, tool arguments, tool outputs, retries, final answer
- compose reusable agent profiles with system prompts and tool selections locked in
- keep clean boundaries between runtime, tools, MCP, policies, and profiles
Built for developers who want the agent loop observable, interchangeable, and out of the way.
Install
pip install shipit-agentWith optional extras:
pip install 'shipit-agent[openai]' # OpenAI SDK
pip install 'shipit-agent[anthropic]' # Anthropic SDK (native thinking blocks)
pip install 'shipit-agent[litellm]' # LiteLLM (Bedrock, Gemini, Groq, Together, …)
pip install 'shipit-agent[playwright]' # In-process browser for open_url and web_search
pip install 'shipit-agent[all]' # Everything30-second example
from shipit_agent import Agent
from shipit_agent.llms import OpenAIChatLLM
agent = Agent.with_builtins(llm=OpenAIChatLLM(model="gpt-4o-mini"))
for event in agent.stream("Search the web for today's Bitcoin price in USD."):
print(event.type, event.message)Emits events like:
run_started Agent run started
step_started LLM completion started
reasoning_started 🧠 Model reasoning started
reasoning_completed 🧠 Model reasoning completed
tool_called Tool called: web_search
tool_completed Tool completed: web_search
run_completed Agent run completedWhy SHIPIT Agent
Live reasoning events
Extended thinking blocks from o1/o3/gpt-5/Claude/gpt-oss are automatically extracted and streamed as reasoning_started / reasoning_completed events. Your UI can show a live "Thinking" panel for free.
Truly incremental streaming
agent.stream() runs the agent on a background thread and yields events through a queue as they happen. Works in Jupyter, VS Code, WebSocket, SSE, and terminals.
Bulletproof Bedrock tool pairing
Every toolUse gets a paired toolResult. Planner output is injected as user context, not orphan tool-results. Hallucinated tool names get synthetic error results. Multi-iteration Bedrock loops just work.
Semantic tool discovery
tool_search lets the agent ask "which tool should I use for X?" and get a ranked shortlist. No more 28-tool context bloat, no more tool hallucinations.
Zero-friction provider switching
Edit one line in .env — SHIPIT_LLM_PROVIDER=openai — and build_llm_from_env() does the rest. Seven providers supported out of the box.
Playwright-powered open_url
In-process Chromium fetches JS-rendered pages with a realistic UA, handles anti-bot 503s, and falls back to stdlib urllib if Playwright isn't installed. No external scraper services.
Parallel tool execution
When the LLM returns multiple tool calls, run them concurrently with parallel_tool_execution=True. Results stay in order. Typically 2-3x faster for multi-tool turns.
Hooks & middleware
AgentHooks with @on_before_llm, @on_after_llm, @on_before_tool, @on_after_tool for cost tracking, rate limiting, content filtering, and guardrails. No subclassing.
Async runtime
AsyncAgentRuntime with async run() and async stream() for FastAPI, Starlette, and modern async Python. Same features as the sync runtime.
Graceful error recovery
Tool failures produce error messages instead of crashing the run. The LLM sees the error and can try a different approach. Safer retry defaults prevent retrying on bugs.
Next steps
- Install and run the quick start — get an agent running in five minutes
- Explore streaming events — understand the 14 event types and what they carry
- Reasoning and thinking steps — render a live "Thinking" panel in your UI
- Create a custom tool — build a new tool from scratch
- Use skills — packaged skills, custom skills, Agent, and DeepAgent workflows
- MCP integration — attach remote MCP servers to extend capabilities
- Parallel tool execution — speed up multi-tool turns
- Hooks & middleware — add cost tracking, logging, and guardrails
- Async runtime — use with FastAPI and async Python
- Context window management — track tokens and manage context limits
- Error recovery — graceful failure handling and retry policies
Try it now — runnable examples
The repo ships with 7 numbered, copy-pasteable examples covering every major feature. Pick one and run it in 30 seconds.
| # | What | Run |
|---|---|---|
| 1 | Hello, agent. The shortest possible runnable example | python examples/01_hello_agent.py |
| 2 | Live streaming with colored reasoning events | python examples/02_streaming_with_reasoning.py |
| 3 | Same agent, 5 different LLM providers back-to-back | python examples/03_provider_swap.py |
| 4 | End-to-end research workflow with web search + URL fetching | python examples/04_research_agent.py "your question" |
| 5 | Custom tools — function-style and class-style | python examples/05_custom_tool.py |
| 6 | Persistent chat session with file-backed memory | python examples/06_chat_session.py |
| 7 | Semantic tool discovery with tool_search | python examples/07_tool_search.py |
See the full examples README →
Provider compatibility matrix
| Provider | Reasoning blocks | Tool calling | Streaming | Bedrock pairing | Built-in tools |
|---|---|---|---|---|---|
OpenAI (o1, o3, o4, gpt-5) | ✅ Native | ✅ | ✅ | n/a | ✅ |
OpenAI (gpt-4o, gpt-4o-mini) | ❌ | ✅ | ✅ | n/a | ✅ |
Anthropic (claude-opus-4, claude-3.7) | ✅ Native (with thinking_budget_tokens) | ✅ | ✅ | n/a | ✅ |
AWS Bedrock (gpt-oss-120b) | ✅ Via LiteLLM | ✅ | ✅ | ✅ Bulletproof | ✅ |
AWS Bedrock (anthropic.claude-*) | ✅ Via LiteLLM | ✅ | ✅ | ✅ Bulletproof | ✅ |
Google Gemini (gemini-1.5-pro) | ❌ | ✅ | ✅ | n/a | ✅ |
| Google Vertex AI | ❌ | ✅ | ✅ | n/a | ✅ |
Groq (llama-3.3-70b) | ❌ | ✅ | ✅ | n/a | ✅ |
| Together AI | ❌ | ✅ | ✅ | n/a | ✅ |
| Ollama (local) | ❌ | ✅ | ✅ | n/a | ✅ |
| DeepSeek R1 (via LiteLLM proxy) | ✅ Native | ✅ | ✅ | n/a | ✅ |
| LiteLLM Proxy (self-hosted gateway) | ✅ Pass-through | ✅ | ✅ | n/a | ✅ |
Tip: if you want a "Thinking" panel UI without paying for o1/Claude, AWS Bedrock's
openai.gpt-oss-120b-1:0is the cheapest reasoning-capable model in the matrix and ships withAgent.with_builtins(llm=BedrockChatLLM())out of the box.
What you get vs. what you don't
| ✅ shipit-agent does | ❌ shipit-agent does NOT do |
|---|---|
| Run agents with tools, MCP, memory, sessions | Train models or fine-tune |
| Stream events incrementally as they happen | Provide a hosted control plane |
| Extract reasoning blocks from any provider | Replace LangChain / LangGraph / CrewAI wholesale |
| Guarantee Bedrock tool-pairing correctness | Manage your cloud infrastructure |
| Support 9 LLM providers via one API | Lock you into a specific vendor |
| Ship with 28+ built-in tools | Force you to use any of them |
| Stay out of your way (small, focused runtime) | Hide the agent loop behind abstractions |
This is a library, not a framework. The runtime is small enough to read in one sitting (shipit_agent/runtime.py is under 400 lines). Bring your own LLM, tools, and storage; the runtime composes them and gets out of the way.