The Deep Agent Factory v1.0.12

Build agentsthat think deeply.

Claude API power: server-side tools, document citations, the Batch API, and interleaved thinking — plus cross-provider prompt caching, on top of the v1.0.11 control plane. Bring your own LLM, attach remote MCP servers, stream native thinking.

Live stream — autopilot.stream()
run_started · goal received
iteration 2 · 2/3 criteria met
artifact · iter2-block1.py
critic · confidence=0.87
Get Started
$pip install 'shipit-agent[all]'
1800+Tests Passing
11LLM Providers
30+Built-in Tools
40++Built-in Agents
0Core Dependencies
goal_example.py
Live
MCP: Connected
Python 3.12
42.4 MB
805+ TestsUnit + E2E · all passing
9 ProvidersOpenAI · Anthropic · Bedrock · Gemini…
30+ ToolsBuilt-in & extensible
47 SpecialistsPrebuilt & ready
AutopilotLong-running · v1.0.6
Reflection criticSecond-opinion reviewer
ArtifactsClaude-Desktop-style
Fan-outParallel batches
ShipCrewMulti-agent DAG
100% LocalNo data leaves
Open SourceMIT licensed
Zero DepsOnly Pydantic
Live Examples

See it run. Every pattern, live.

Six lines of Python, six ways to ship. Tap a tab and watch the agent actually work — streaming tokens, calling tools, reasoning in steps.

basic.py
Live
from shipit_agent import Agent
from shipit_agent.llms import build_llm_from_env
 
agent = Agent.with_builtins(llm=build_llm_from_env())
result = agent.run("Summarize this quarter's sales report.")
print(result.output)
output · live
shipit-agent
Basic agent
Flagship features · since v1.0.8

Five flagship features that beat the competition.

Validation retry that stays in your conversation. Process supervision in a single constructor argument. ChatGPT-style memory, principled. Replay.io for AI agents. A self-hostable Devin. All five reachable from from shipit_agent import ….

structured-output.py
from pydantic import BaseModel
from shipit_agent import Agent
 
class Movie(BaseModel):
title: str
rating: float
 
result = agent.run(
"Recommend a thriller.",
output_schema=Movie,
max_validation_retries=2,
)
print(result.parsed) # Movie(title='Heat', rating=8.5)
Why this beats the competition
Beats LangChain

LangChain's OutputFixingParser runs a fresh LLM call. Ours stays in the same conversation, so the model has full context of what it tried.

Available on Agent and DeepAgent
Full Docs
+318 tests·1508 passing·0 regressions·5 docs pages·5 notebooks
v1.0.7 · Autopilot · Agents for every roleNEW

Run for 24 hours. For every role.

Autopilot turns any agent — developer, designer, sales rep, analyst, support — into a budget-gated, checkpointed worker. Cumulative budgets across resume · SIGTERM-safe · dollar tracking · dashboards as artifacts · your own LiteLLM proxy in three fields. Fan out, reflect with a critic, stream every step.

See the full page
live:run_started
6 primitivesBudget-gatedAtomic checkpointsLive streaming805 tests8 notebooksRead more
v1.0.7 · Agents for every roleNEW

From sales to design to ops. One framework.

shipit-agent isn't just for developers anymore. Nine new role specialists ship with the framework — plug any of them into an Agent or an Autopilot and watch them pick up the tools they need.

Code Reviewer Bot

Reviews PRs, cites lines, insists on tests.

githubread_filegrep_filesvision
code-reviewer-bot

Release Engineer

Verifies CI, tags, ships, posts announcement.

githubbashslack
release-engineer

Figma Designer

Reviews designs, extracts tokens, writes handoffs.

figmavisionrender_dashboard
figma-designer

Sales Rep

Enriches leads, drafts outreach, logs to Salesforce.

salesforcelinkedin_searchgmail_search
sales-rep

Account Executive

Pipeline reviews, account health, forecasts.

salesforcelinkedin_searchsql
account-executive

Sales Ops

Data-quality audits, pipeline dashboards.

salesforcesqlgoogle_sheets
sales-ops

Technical Recruiter

Targeted sourcing, short outreach, candidate tracking.

linkedin_searchgoogle_calendargoogle_sheets
recruiter

Finance Analyst

Month-end close, cash flow, contract reviews.

stripepdfsqlrender_dashboard
finance-analyst

Support Agent

Triages tickets, reads screenshots, flags incidents.

zendeskvisionslack
customer-support-agent
9 new personas56+ prebuilt totalLoad by nameDrop-in toolsBrowse all roles
AgentRegistry · 56 specialists

Load any persona by name. One line.

56 prebuilt specialists across code, design, sales, marketing, finance, support. Each one is a profile — name, system prompt, suggested tools, iteration budget. Hand it to your Agent and ship.

load_specialist.py
from shipit_agent.agents import AgentRegistry
registry = AgentRegistry.default()
spec = registry.get("code-reviewer-bot")
code-reviewer-bot.py
from shipit_agent import Agent
from shipit_agent.agents import AgentRegistry
spec = AgentRegistry.default().get("code-reviewer-bot")
agent = Agent.with_builtins(
llm=opus_llm,
prompt=spec.system_prompt(),
max_iterations=spec.max_iterations,
)
result = agent.run(
"Review pull request shipiit/shipit_agent#142. "
"Comment inline on any issues."
)
Code specialist

PR review automation — inline comments on diffs, security flags, style nits.

Tools auto-wired
GitHub · read_file · grep_files · vision
Drop into any Agent or DeepAgent
Full roster
v1.0.7 · 12 new toolsNEW

Twelve new tools. Every role benefits.

Six core tools every agent can use — plus six SaaS integrations for the personas that need them. All opt-in. All discoverable via tool_search.

Browse all tools
Core toolseveryone benefits
GitHub

PRs, issues, branches, reviews.

GitLab

Merge requests, pipelines, releases.

SQL

SQLAlchemy, multi-dialect queries.

Vision

See screenshots, mockups, charts.

PDF

Extract text and tables from docs.

Observability

LangSmith + OpenTelemetry exporters.

Persona SaaSplug in when your agent needs it
Figma

Read frames, tokens, comments.

Salesforce

Leads, opps, accounts, pipeline.

Stripe

Charges, subs, invoices, payouts.

Google Sheets

Read, append, format ranges.

Zendesk

Tickets, users, organizations.

LinkedIn

Read-only people + company search.

Connectors17

One credential store. Every SaaS your agents need.

Register each service once with a CredentialRecord and every agent that loads the matching tool picks it up automatically. OAuth, API tokens, Basic auth, custom headers — all handled.

Read the setup guide
token → credential_store → tool
Gmail
OAuth2
Google Drive
OAuth2
Google Calendar
OAuth2
NEW
Google Sheets
OAuth2
Slack
Bot token
Linear
API key
Notion
Integration token
Jira
Email + token
Confluence
Email + token
HubSpot
Private app token
NEW
GitHub
PAT
NEW
GitLab
PAT
NEW
Figma
PAT
NEW
Salesforce
OAuth + instance URL
NEW
Stripe
Secret key
NEW
Zendesk
Email + token + subdomain
NEW
LinkedIn search
Vendor API
OAuth helpers built-inallow_writes gate per toolrate-limit awareone store, every tool
LLM providers9+

Bring your model. Or your proxy.

Native SDKs for Anthropic, OpenAI, and AWS Bedrock. Every other provider routes through LiteLLM — 100+ models, zero adapter code. Run a self-hosted LiteLLM proxy? Plug every agent into it with three fields.

LiteLLM proxy setup
model + api_base + api_key
Anthropic
Native SDK
Claude Opus 4 · Sonnet 4 · Haiku 4
OpenAI
Native SDK
gpt-4o · gpt-4.1 · o3 · o4-mini
AWS Bedrock
Native SDK
Claude · Nova · Llama · gpt-oss · Mistral · Titan
Google Gemini
Via LiteLLM
gemini-1.5-pro · 1.5-flash · 2.0
Vertex AI
Via LiteLLM
Gemini · Claude on Vertex · Llama on Vertex
Groq
Via LiteLLM
Llama 3.3 70B · Llama 3.1 405B
Together AI
Via LiteLLM
200+ open-weight models
Ollama
Via LiteLLM
Local: Llama · Qwen · DeepSeek · Mistral
LiteLLM direct
100+ providers
Any LiteLLM-supported model
Bring your own LiteLLM proxy
Running litellm --config as a central gateway? Point every Agent, Autopilot, ShipCrew at it with model + api_base + api_key. The proxy handles upstream credentials, rate-limiting, routing, cost tracking. Zero adapter code on your side.
Read the guide
Zero adapter codeEnv-var shortcutsStreaming supportedCost tracking built-in
The Deep Agent Factory

Cognitive architectures out of the box.

When a single agent loop isn't enough, switch to a Deep Agent. Gain planning, sub-agent delegation, self-reflection, and runtime tool creation.

goal_example.py
from shipit_agent.deep import GoalAgent, Goal
 
agent = GoalAgent.with_builtins(
llm=llm,
goal=Goal(
objective="Compare Python async libraries",
success_criteria=[
"Speed benchmarks",
"Memory usage comparison",
"Cites data sources"
]
)
)
 
result = agent.run()
print(result.goal_status) # "completed"
print(result.criteria_met) # [True, True, True]
Live Architecture
Goal
Planner
Execute
criteria_met: [True, True, True]
Ready to execute
Full Docs
Super RAG Subsystem

Zero-hallucination context injection.

A powerful, pluggable retrieval-augmented-generation subsystem built directly into the agent. Runs hybrid search (Vector + BM25 + Reciprocal Rank Fusion) out of the box with zero required dependencies.

Explore RAG Pipeline
Vector Store
Dense (Semantic)
BM25 Store
Sparse (Keywords)
Reciprocal Rank Fusion
AST Metadata + Exponential Time Decay
agent.run("Explain JWT rotation")
Injected Hit [1] lib/auth.ts (Score: 0.98)
Injected Hit [2] docs/auth.md (Score: 0.91)
Result attached to result.rag_sources
Prebuilt Agents

40 agents. One line of code.

Production-ready agent personas across 8 categories. Load by name, point at your project, and run. Security audits, code reviews, architecture analysis — instant.

security_agent.py
from shipit_agent import Agent
from shipit_agent.agents import AgentRegistry
 
registry = AgentRegistry.default()
agent_def = registry.get("security-auditor")
 
agent = Agent.with_builtins(
llm=llm,
prompt=agent_def.system_prompt(),
project_root="/path/to/my-project",
)
 
result = agent.run("Perform a full OWASP Top 10 audit")
print(result.output)
Agent Output
Audit Report
CRITSQL Injection
HIGHHardcoded Key
MEDMissing CSRF
LOWDebug Mode
4 findings detected
5 agents available
All 40 Agents
ShipCrew

Multi-agent crews that
collaborate.

DAG-based orchestration. Agents work in parallel, pass results downstream, and a coordinator ensures quality. Like OpenClaw, but built into your Python code.

🔍
researcher
Senior Researcher
📊
analyst
Data Analyst
✍️
writer
Technical Writer
🔎
reviewer
Quality Reviewer
data flows automatically via {output_key} templates
Three execution modes
Sequential

Tasks execute in topological order. Each task's output flows to the next via template variables.

Parallel

Independent tasks in the same DAG layer run concurrently. Merge at sync points automatically.

Hierarchical

Coordinator LLM dynamically assigns tasks, reviews output, and can request revisions.

# Build a crew in 10 lines
crew = ShipCrew(
name="report-crew",
agents=[researcher, analyst, writer],
tasks=[
ShipTask("research", ...),
ShipTask("analyze", ...),
ShipTask("draft", depends_on=["research", "analyze"]),
],
process="parallel",
)
# Stream events in real-time
for event in crew.stream(topic="AI agents"):
print(f"[{event.type}] {event.message}")
The Skills Factory

Domain knowledge as executable code.

Skills are runtime behavior packages. They auto-match your intent and inject specialized tools on the fly.

Skill Catalog
Skill Runtime
U
"Create a FastAPI REST API with User and Task models, CRUD endpoints, and a Dockerfile."
Full-Stack Developer
+write_file+edit_file+bash+run_code+plan_task
SUCCESS: Created 6 files — app/main.py, models.py, routes/tasks.py, database.py, requirements.txt, Dockerfile.
Complete
Active
13 tools injected
Explore Skills
Pipelines & Agent Teams

Deterministic logic meets dynamic reasoning.

Chain agents together like UNIX pipes. Use Pipeline.sequential() for strict step-by-step processing, or parallel() to fan-out sub-tasks concurrently.

Explore Agent Teams
multi_agent.py
from shipit_agent import Pipeline, step, parallel

pipe = Pipeline.sequential(
step("plan", agent=planner, prompt="Decompose: {objective}"),
parallel(
step("code_auth", agent=coder, prompt="Auth module from: {plan.output}"),
step("code_db", agent=coder, prompt="DB module from: {plan.output}")
),
step("verify", agent=verifier, prompt="Verify {code_auth.output} and {code_db.output}")
)

result = pipe.run(objective="Build a secure REST API")
Execution Graph
P
Planner
C1
Coder (Auth)
C2
Coder (DB)
V
Verifier
Real-Time Events

Watch the thought process unfold.

Under the hood, Shipit runs on a background thread and pushes AgentEvent objects through a thread-safe queue. Every tool invocation, planning step, and raw reasoning block reaches your loop the instant it's emitted—no buffering.

View Streaming API
agent.stream() loop
run_started
0.0s
mcp_attached
0.1s
planning_started
0.2s
reasoning_started
0.8s
tool_called
2.4s
tool_completed
4.1s
run_completed
5.5s
Advanced Memory System

Three memory types, one line of code.

Initialize with AgentMemory.default() and the agent handles conversation, semantic history, and entity tracking automatically.

conversation_memory.py
mem = ConversationMemory(
strategy="summary",
summary_llm=llm,
window_size=20
)
 
# Old messages → LLM summary
# Recent 20 → kept verbatim
msgs = mem.get_messages()
# [summary_msg, msg_81, ..., msg_100]
Visualization
buffer
window
summary
token
Notification Hub

Agents that report back.

Push status updates to Slack, Discord, and Telegram automatically. Budget alerts, tool failures, run completions — zero polling.

slack_notify.py
from shipit_agent import Agent
from shipit_agent.notifications import SlackNotifier, NotificationManager
 
slack = SlackNotifier(
webhook_url="https://hooks.slack.com/services/T.../B.../xxx",
channel="#agent-alerts",
username="ShipIt Agent",
)
 
manager = NotificationManager([slack])
agent = Agent.with_builtins(
llm=llm,
hooks=manager.as_hooks("security-auditor"),
)
 
# Slack receives: run_started, tool events, run_completed
result = agent.run("Audit the auth module for vulnerabilities")
Preview
Slack Block Kit
Security Auditor — Run Completed
Completed in 23.1s | Cost: $0.47
agent: security-auditor
duration: 23.1s
cost: $0.47
findings: 3 critical
14:32:24 | severity: warning
Zero external dependencies
Setup Guide
Developer Experience

Clean runtime. Observable execution.

We built Shipit to expose low-level control over its execution loop. Keep clean boundaries between your runtime, tools, policies, and profiles.

Zero Core Dependencies

Shipit keeps its footprint light. The base library requires only `pydantic`. Provider SDKs like `openai`, `anthropic`, or `litellm` are strictly opt-in extras.

Real-time Event Streaming

Watch the thought process unfold instantly. Every token, tool argument, reasoning block, and retry streams natively out of the agent loop.

Native MCP Integration

Instantly attach any remote or local Model Context Protocol server. Give your agent secure access to Linear, Slack, Postgres, or internal tooling with one line of code.

Pydantic Structured Output

Stop parsing raw text. Define your exact output schema using Pydantic models. The agent is forced to return strict, typed JSON—perfect for data pipelines.

main.py
from pydantic import BaseModel
from shipit_agent import Agent, Tool
class ResearchResult(BaseModel):
summary: str
confidence: float
# Bring your own LLM and Tools
agent = Agent.with_builtins(
llm=AnthropicChatLLM(model="claude-3-7-sonnet"),
tools=[web_search, git_status],
output_schema=ResearchResult
)
# Stream reasoning and tool calls natively
for chunk in agent.stream("Analyze recent commits"):
print(chunk.content, end="")
python main.py
<thinking> I need to run git_status first to see...
{ 'summary': '...', 'confidence': 0.95 }
Why SHIPIT

How we compare.

SHIPIT Agent is a library, not a framework. Small, focused, and observable. Here's how it stacks up against the alternatives.

Feature
SHIPIT
LangChain
CrewAI
AutoGen
Zero core dependencies
Native reasoning/thinking blocks
Bring your own LLM
MCP server integration
Parallel tool execution
Built-in Super RAG
Deep agent architectures
Pydantic structured output
Real-time event streaming
Extensible markdown skills
Bulletproof Bedrock pairing
Agent memory system

Complete Toolkit

Everything you need for autonomous engineering. No wrappers, no bloated abstractions — highly capable tools that plug directly into your workflow.

25+ Built-in Tools

web_search, open_url, bash, read_file, edit_file, write_file, run_code, plan_task, verify_output, sub_agent, and 15 more. All opt-in, all discoverable via tool_search.

Learn More

9 SaaS Connectors

Gmail, Google Drive, Slack, Linear, Jira, Notion, Confluence, GitHub, PostgreSQL. Each surfaces as agent tools — no wrapper code needed.

Learn More

100% Local & Secure

Your code stays on your machine. Shipit runs locally, isolates memory per project, and requires explicit permission for tool executions.

Learn More

Native MCP Integration

Attach any remote or local Model Context Protocol server. Give agents access to Linear, Slack, Postgres, or internal tooling with one line.

Learn More

Parallel Execution

When the LLM returns multiple tool calls, run them concurrently. Results stay in order. Typically 2-3x faster for multi-tool turns.

Learn More

Pydantic Structured Output

Define output schemas using Pydantic models. The agent returns strict, typed JSON — perfect for data pipelines and downstream systems.

Learn More

Extensible Markdown Skills

Drop a skill file and the agent treats it as an executable behavior package. Skills auto-match prompts and inject tools at runtime.

Learn More

Bulletproof Bedrock Pairing

Every toolUse gets a paired toolResult — even on errors, hallucinated tools, or planner output. Multi-iteration Bedrock loops just work.

Learn More

Bash & Code Execution

Full terminal support with sandboxed subprocess execution. Agents can run code, install packages, run tests, and create git commits.

Learn More

Autopilot runtime

Long-running, goal-driven, budget-gated. Runs until every success criterion is met or a budget trips. Atomic checkpoint every iteration; `autopilot.resume(run_id)` picks up after a crash.

Learn More

Reflection critic

Second-opinion reviewer scores every iteration against the goal criteria and feeds suggestions back into the next iteration. Confidence-gated early termination on confident agreement.

Learn More

Artifact collector

Claude-Desktop-style deliverables. Code fences + markdown docs auto-extracted from every iteration; tools push explicit artifacts via result metadata. Optional disk persistence for CI.

Learn More

Parallel fan-out

autopilot.fanout(items, template) dispatches N child Autopilots concurrently. Each child gets a budget-scaled slice so aggregate spend stays bounded. Per-child streaming.

Learn More

Scheduler daemon

Persistent JSON goal queue drained tick-by-tick. Crash-safe, SIGINT/SIGTERM-clean. systemd / launchd / Docker recipes in the docs turn any machine into a 24-hour Autopilot host.

Learn More

56+ role specialists

Prebuilt AgentDefinitions — architects, reviewers, debuggers, designers, PMs, sales reps, account execs, recruiters, finance analysts, support agents. Drop a specialist into an Agent or hand its prompt to an Autopilot.

Learn More
Ready to deploy

Start shipping today.

Whether you're exploring deep architectures, executing multi-step workflows, or integrating custom tools — Shipit is your autonomous Python engineer.

Get Started Free
$pip install 'shipit-agent[all]'