The Deep Agent Factory v1.0.12

Build agentsthat think deeply.

Name: SHIPIT Agent
Author: SHIPIT

Claude API power: server-side tools, document citations, the Batch API, and interleaved thinking — plus cross-provider prompt caching, on top of the v1.0.11 control plane.
Bring your own LLM, attach remote MCP servers, stream native thinking.

Live stream — autopilot.stream()

run_started · goal received

iteration 2 · 2/3 criteria met

artifact · iter2-block1.py

critic · confidence=0.87

Get Started

$pip install 'shipit-agent[all]'

1800+Tests Passing

11LLM Providers

30+Built-in Tools

40++Built-in Agents

0Core Dependencies

goal_example.py

Live

MCP: Connected

Python 3.12

42.4 MB

805+ TestsUnit + E2E · all passing

9 ProvidersOpenAI · Anthropic · Bedrock · Gemini…

30+ ToolsBuilt-in & extensible

47 SpecialistsPrebuilt & ready

AutopilotLong-running · v1.0.6

Reflection criticSecond-opinion reviewer

ArtifactsClaude-Desktop-style

Fan-outParallel batches

ShipCrewMulti-agent DAG

100% LocalNo data leaves

Open SourceMIT licensed

Zero DepsOnly Pydantic

New in v1.0.15 — The Super Agent

One agent,
every sector.

Sector specialists in one line, a prebuilt MCP catalog, polished PDF / Excel / PowerPoint deliverables, Claude-Code-style tool logs, and cron-style scheduling — with any LLM provider.

roles_agent.py

from shipit_agent import Agent
 
# One line to a sector specialist — any provider's LLM
analyst  = Agent.for_role("finance-analyst", llm=llm)
writer   = Agent.for_role("marketing-writer", llm=llm)
dev      = Agent.for_role("generalist-developer", llm=llm)
designer = Agent.for_role("figma-designer", llm=llm)
 
# The definition picks its own tools from the builtins:
# finance-analyst → sql, google_sheets, pdf,
#                   render_dashboard, stripe, build_document
result = analyst.run("Close Q2 and hand me the workbook.")
 
# Unknown ids fail helpfully:
Agent.for_role("finance", llm=llm)
# ValueError: Unknown role 'finance'.
#             Did you mean: finance-analyst?

Preview

40+ prebuilt specialists

finance-analystFinance

marketing-writerMarketing

generalist-developerEngineering

figma-designerDesign

researcherResearch

sales-repSales

▶ Works with all 9 LLM providers

Full Guide

Live Examples

See it run. Every pattern, live.

Six lines of Python, six ways to ship. Tap a tab and watch the agent actually work — streaming tokens, calling tools, reasoning in steps.

basic.py

Live

from shipit_agent import Agent

from shipit_agent.llms import build_llm_from_env

agent = Agent.with_builtins(llm=build_llm_from_env())

result = agent.run("Summarize this quarter's sales report.")

print(result.output)

output · live

shipit-agent

Python 3.12

Basic agent

Flagship features · since v1.0.8

Five flagship features
that beat the competition.

Validation retry that stays in your conversation. Process supervision in a single constructor argument. ChatGPT-style memory, principled. Replay.io for AI agents. A self-hostable Devin. All five reachable from from shipit_agent import ….

structured-output.py

from pydantic import BaseModel
from shipit_agent import Agent
 
class Movie(BaseModel):
    title: str
    rating: float
 
result = agent.run(
    "Recommend a thriller.",
    output_schema=Movie,
    max_validation_retries=2,
)
print(result.parsed)  # Movie(title='Heat', rating=8.5)

Why this beats the competition

Beats LangChain

LangChain's OutputFixingParser runs a fresh LLM call. Ours stays in the same conversation, so the model has full context of what it tried.

▶Available on Agent and DeepAgent

Full Docs

+318 tests·1508 passing·0 regressions·5 docs pages·5 notebooks

v1.0.7 · Autopilot · Agents for every roleNEW

Run for 24 hours. For every role.

Autopilot turns any agent — developer, designer, sales rep, analyst, support — into a budget-gated, checkpointed worker. Cumulative budgets across resume · SIGTERM-safe · dollar tracking · dashboards as artifacts · your own LiteLLM proxy in three fields. Fan out, reflect with a critic, stream every step.

See the full page

live:run_started

Runtime Critic Artifacts Fan-out Daemon Specialists Cost router Async ask Vision Sandbox Bulletproof 24h Dashboards LiteLLM proxy

6 primitivesBudget-gatedAtomic checkpointsLive streaming805 tests8 notebooksRead more

v1.0.7 · Agents for every roleNEW

From sales to design to ops. One framework.

shipit-agent isn't just for developers anymore. Nine new role specialists ship with the framework — plug any of them into an Agent or an Autopilot and watch them pick up the tools they need.

Code Reviewer Bot

Reviews PRs, cites lines, insists on tests.

githubread_filegrep_filesvision

code-reviewer-bot

Release Engineer

Verifies CI, tags, ships, posts announcement.

githubbashslack

release-engineer

Figma Designer

Reviews designs, extracts tokens, writes handoffs.

figmavisionrender_dashboard

figma-designer

Sales Rep

Enriches leads, drafts outreach, logs to Salesforce.

salesforcelinkedin_searchgmail_search

sales-rep

Account Executive

Pipeline reviews, account health, forecasts.

salesforcelinkedin_searchsql

account-executive

Sales Ops

Data-quality audits, pipeline dashboards.

salesforcesqlgoogle_sheets

sales-ops

Technical Recruiter

Targeted sourcing, short outreach, candidate tracking.

linkedin_searchgoogle_calendargoogle_sheets

recruiter

Finance Analyst

Month-end close, cash flow, contract reviews.

stripepdfsqlrender_dashboard

finance-analyst

Support Agent

Triages tickets, reads screenshots, flags incidents.

zendeskvisionslack

customer-support-agent

9 new personas56+ prebuilt totalLoad by nameDrop-in toolsBrowse all roles

AgentRegistry · 56 specialists

Load any persona
by name. One line.

56 prebuilt specialists across code, design, sales, marketing, finance, support. Each one is a profile — name, system prompt, suggested tools, iteration budget. Hand it to your Agent and ship.

load_specialist.py

from shipit_agent.agents import AgentRegistry
 
registry = AgentRegistry.default()
spec = registry.get("code-reviewer-bot")

code-reviewer-bot.py

from shipit_agent import Agent
from shipit_agent.agents import AgentRegistry
 
spec = AgentRegistry.default().get("code-reviewer-bot")
 
agent = Agent.with_builtins(
    llm=opus_llm,
    prompt=spec.system_prompt(),
    max_iterations=spec.max_iterations,
)
 
result = agent.run(
    "Review pull request shipiit/shipit_agent#142. "
    "Comment inline on any issues."
)

Code specialist

PR review automation — inline comments on diffs, security flags, style nits.

Tools auto-wired

GitHub · read_file · grep_files · vision

▶Drop into any Agent or DeepAgent

Full roster

v1.0.7 · 12 new toolsNEW

Twelve new tools. Every role benefits.

Six core tools every agent can use — plus six SaaS integrations for the personas that need them. All opt-in. All discoverable via tool_search.

Browse all tools

Core toolseveryone benefits

GitHub

PRs, issues, branches, reviews.

GitLab

Merge requests, pipelines, releases.

SQL

SQLAlchemy, multi-dialect queries.

Vision

See screenshots, mockups, charts.

PDF

Extract text and tables from docs.

Observability

LangSmith + OpenTelemetry exporters.

Persona SaaSplug in when your agent needs it

Figma

Read frames, tokens, comments.

Salesforce

Leads, opps, accounts, pipeline.

Stripe

Charges, subs, invoices, payouts.

Google Sheets

Read, append, format ranges.

Zendesk

Tickets, users, organizations.

Read-only people + company search.

Connectors17

One credential store. Every SaaS your agents need.

Register each service once with a CredentialRecord and every agent that loads the matching tool picks it up automatically. OAuth, API tokens, Basic auth, custom headers — all handled.

Read the setup guide

token → credential_store → tool

Gmail

OAuth2

Google Drive

OAuth2

Google Calendar

OAuth2

NEW

Google Sheets

OAuth2

Slack

Bot token

Linear

API key

Notion

Integration token

Jira

Email + token

Confluence

Email + token

HubSpot

Private app token

NEW

GitHub

PAT

NEW

GitLab

PAT

NEW

Figma

PAT

NEW

Salesforce

OAuth + instance URL

NEW

Stripe

Secret key

NEW

Zendesk

Email + token + subdomain

NEW

LinkedIn search

Vendor API

OAuth helpers built-inallow_writes gate per toolrate-limit awareone store, every tool

LLM providers9+

Bring your model. Or your proxy.

Native SDKs for Anthropic, OpenAI, and AWS Bedrock. Every other provider routes through LiteLLM — 100+ models, zero adapter code. Run a self-hosted LiteLLM proxy? Plug every agent into it with three fields.

LiteLLM proxy setup

model + api_base + api_key

Anthropic

Native SDK

Claude Opus 4 · Sonnet 4 · Haiku 4

OpenAI

Native SDK

gpt-4o · gpt-4.1 · o3 · o4-mini

AWS Bedrock

Native SDK

Claude · Nova · Llama · gpt-oss · Mistral · Titan

Google Gemini

Via LiteLLM

gemini-1.5-pro · 1.5-flash · 2.0

Vertex AI

Via LiteLLM

Gemini · Claude on Vertex · Llama on Vertex

Groq

Via LiteLLM

Llama 3.3 70B · Llama 3.1 405B

Together AI

Via LiteLLM

200+ open-weight models

Ollama

Via LiteLLM

Local: Llama · Qwen · DeepSeek · Mistral

LiteLLM direct

100+ providers

Any LiteLLM-supported model

Bring your own LiteLLM proxy

Running litellm --config as a central gateway? Point every Agent, Autopilot, ShipCrew at it with model + api_base + api_key. The proxy handles upstream credentials, rate-limiting, routing, cost tracking. Zero adapter code on your side.

Read the guide

Zero adapter codeEnv-var shortcutsStreaming supportedCost tracking built-in

The Deep Agent Factory

Cognitive architectures
out of the box.

When a single agent loop isn't enough, switch to a Deep Agent. Gain planning, sub-agent delegation, self-reflection, and runtime tool creation.

goal_example.py

from shipit_agent.deep import GoalAgent, Goal
 
agent = GoalAgent.with_builtins(
    llm=llm,
    goal=Goal(
        objective="Compare Python async libraries",
        success_criteria=[
            "Speed benchmarks",
            "Memory usage comparison",
            "Cites data sources"
        ]
    )
)
 
result = agent.run()
print(result.goal_status)   # "completed"
print(result.criteria_met)  # [True, True, True]

Live Architecture

Goal

Planner

Execute

criteria_met: [True, True, True]

▶ Ready to execute

Full Docs

Super RAG Subsystem

Zero-hallucination
context injection.

A powerful, pluggable retrieval-augmented-generation subsystem built directly into the agent. Runs hybrid search (Vector + BM25 + Reciprocal Rank Fusion) out of the box with zero required dependencies.

Explore RAG Pipeline

Vector Store

Dense (Semantic)

BM25 Store

Sparse (Keywords)

Reciprocal Rank Fusion

AST Metadata + Exponential Time Decay

agent.run("Explain JWT rotation")

✓ Injected Hit [1] lib/auth.ts (Score: 0.98)

✓ Injected Hit [2] docs/auth.md (Score: 0.91)

Result attached to result.rag_sources

Prebuilt Agents

40 agents.
One line of code.

Production-ready agent personas across 8 categories. Load by name, point at your project, and run. Security audits, code reviews, architecture analysis — instant.

security_agent.py

from shipit_agent import Agent
from shipit_agent.agents import AgentRegistry
 
registry = AgentRegistry.default()
agent_def = registry.get("security-auditor")
 
agent = Agent.with_builtins(
    llm=llm,
    prompt=agent_def.system_prompt(),
    project_root="/path/to/my-project",
)
 
result = agent.run("Perform a full OWASP Top 10 audit")
print(result.output)

Agent Output

Audit Report

CRITSQL Injection

HIGHHardcoded Key

MEDMissing CSRF

LOWDebug Mode

4 findings detected

▶ 5 agents available

All 40 Agents

ShipCrew

Multi-agent crews that
collaborate.

DAG-based orchestration. Agents work in parallel, pass results downstream, and a coordinator ensures quality. Like OpenClaw, but built into your Python code.

🔍

researcher

Senior Researcher

📊

analyst

Data Analyst

✍️

writer

Technical Writer

🔎

reviewer

Quality Reviewer

data flows automatically via {output_key} templates

Three execution modes

Sequential

Tasks execute in topological order. Each task's output flows to the next via template variables.

Parallel

Independent tasks in the same DAG layer run concurrently. Merge at sync points automatically.

Hierarchical

Coordinator LLM dynamically assigns tasks, reviews output, and can request revisions.

# Build a crew in 10 lines

crew = ShipCrew(

name="report-crew",

agents=[researcher, analyst, writer],

tasks=[

ShipTask("research", ...),

ShipTask("analyze", ...),

ShipTask("draft", depends_on=["research", "analyze"]),

process="parallel",

)

# Stream events in real-time

for event in crew.stream(topic="AI agents"):

print(f"[{event.type}] {event.message}")

ShipCrew documentation→

The Skills Factory

Domain knowledge
as executable code.

Skills are runtime behavior packages. They auto-match your intent and inject specialized tools on the fly.

Skill Catalog

Skill Runtime

"Create a FastAPI REST API with User and Task models, CRUD endpoints, and a Dockerfile."

Full-Stack Developer

+write_file+edit_file+bash+run_code+plan_task

SUCCESS: Created 6 files — app/main.py, models.py, routes/tasks.py, database.py, requirements.txt, Dockerfile.

Complete

Active

13 tools injected

Explore Skills

Pipelines & Agent Teams

Deterministic logic meets
dynamic reasoning.

Chain agents together like UNIX pipes. Use Pipeline.sequential() for strict step-by-step processing, or parallel() to fan-out sub-tasks concurrently.

Explore Agent Teams

multi_agent.py

from shipit_agent import Pipeline, step, parallel

pipe = Pipeline.sequential(

step("plan", agent=planner, prompt="Decompose: {objective}"),

parallel(

step("code_auth", agent=coder, prompt="Auth module from: {plan.output}"),

step("code_db", agent=coder, prompt="DB module from: {plan.output}")

step("verify", agent=verifier, prompt="Verify {code_auth.output} and {code_db.output}")

)

result = pipe.run(objective="Build a secure REST API")

Execution Graph

Planner

Coder (Auth)

Coder (DB)

Verifier

Real-Time Events

Watch the thought process unfold.

Under the hood, Shipit runs on a background thread and pushes AgentEvent objects through a thread-safe queue. Every tool invocation, planning step, and raw reasoning block reaches your loop the instant it's emitted—no buffering.

View Streaming API

agent.stream() loop

run_started

Initialize Agent(llm=gpt-4o) 0.0s

mcp_attached

Connected to local file system server 0.1s

planning_started

Invoking internal plan_task router 0.2s

reasoning_started

🧠 LLM response contained <thinking> block 0.8s

tool_called

Called web_search(query='Bitcoin USD') 2.4s

tool_completed

Returned 5 articles on crypto markets 4.1s

run_completed

Final answer generated. 8 total steps. 5.5s

Advanced Memory System

Three memory types,
one line of code.

Initialize with AgentMemory.default() and the agent handles conversation, semantic history, and entity tracking automatically.

conversation_memory.py

mem = ConversationMemory(
    strategy="summary",
    summary_llm=llm,
    window_size=20
)
 
# Old messages → LLM summary
# Recent 20 → kept verbatim
msgs = mem.get_messages()
# [summary_msg, msg_81, ..., msg_100]

Visualization

buffer

window

summary

token

Memory Docs

Notification Hub

Agents that
report back.

Push status updates to Slack, Discord, and Telegram automatically. Budget alerts, tool failures, run completions — zero polling.

slack_notify.py

from shipit_agent import Agent
from shipit_agent.notifications import SlackNotifier, NotificationManager
 
slack = SlackNotifier(
    webhook_url="https://hooks.slack.com/services/T.../B.../xxx",
    channel="#agent-alerts",
    username="ShipIt Agent",
)
 
manager = NotificationManager([slack])
agent = Agent.with_builtins(
    llm=llm,
    hooks=manager.as_hooks("security-auditor"),
)
 
# Slack receives: run_started, tool events, run_completed
result = agent.run("Audit the auth module for vulnerabilities")

Preview

Slack Block Kit

Security Auditor — Run Completed

Completed in 23.1s | Cost: $0.47

agent: security-auditor

duration: 23.1s

cost: $0.47

findings: 3 critical

14:32:24 | severity: warning

▶ Zero external dependencies

Setup Guide

Developer Experience

Clean runtime.
Observable execution.

We built Shipit to expose low-level control over its execution loop. Keep clean boundaries between your runtime, tools, policies, and profiles.

Zero Core Dependencies

Shipit keeps its footprint light. The base library requires only `pydantic`. Provider SDKs like `openai`, `anthropic`, or `litellm` are strictly opt-in extras.

Real-time Event Streaming

Watch the thought process unfold instantly. Every token, tool argument, reasoning block, and retry streams natively out of the agent loop.

Native MCP Integration

Instantly attach any remote or local Model Context Protocol server. Give your agent secure access to Linear, Slack, Postgres, or internal tooling with one line of code.

Pydantic Structured Output

Stop parsing raw text. Define your exact output schema using Pydantic models. The agent is forced to return strict, typed JSON—perfect for data pipelines.

main.py

from pydantic import BaseModel

from shipit_agent import Agent, Tool

class ResearchResult(BaseModel):

summary: str

confidence: float

# Bring your own LLM and Tools

agent = Agent.with_builtins(

llm=AnthropicChatLLM(model="claude-3-7-sonnet"),

tools=[web_search, git_status],

output_schema=ResearchResult

)

# Stream reasoning and tool calls natively

for chunk in agent.stream("Analyze recent commits"):

print(chunk.content, end="")

▶ python main.py

<thinking> I need to run git_status first to see...

{ 'summary': '...', 'confidence': 0.95 }

Why SHIPIT

How we compare.

SHIPIT Agent is a library, not a framework. Small, focused, and observable. Here's how it stacks up against the alternatives.

Feature

SHIPIT

LangChain

CrewAI

AutoGen

Zero core dependencies

Native reasoning/thinking blocks

Bring your own LLM

MCP server integration

Parallel tool execution

Built-in Super RAG

Deep agent architectures

Pydantic structured output

Real-time event streaming

Extensible markdown skills

Bulletproof Bedrock pairing

Agent memory system

See the full architecture

Complete Toolkit

Everything you need for autonomous engineering. No wrappers, no bloated abstractions — highly capable tools that plug directly into your workflow.

25+ Built-in Tools

web_search, open_url, bash, read_file, edit_file, write_file, run_code, plan_task, verify_output, sub_agent, and 15 more. All opt-in, all discoverable via tool_search.

Learn More→

9 SaaS Connectors

Gmail, Google Drive, Slack, Linear, Jira, Notion, Confluence, GitHub, PostgreSQL. Each surfaces as agent tools — no wrapper code needed.

Learn More→

100% Local & Secure

Your code stays on your machine. Shipit runs locally, isolates memory per project, and requires explicit permission for tool executions.

Learn More→

Native MCP Integration

Attach any remote or local Model Context Protocol server. Give agents access to Linear, Slack, Postgres, or internal tooling with one line.

Learn More→

Parallel Execution

When the LLM returns multiple tool calls, run them concurrently. Results stay in order. Typically 2-3x faster for multi-tool turns.

Learn More→

Pydantic Structured Output

Define output schemas using Pydantic models. The agent returns strict, typed JSON — perfect for data pipelines and downstream systems.

Learn More→

Extensible Markdown Skills

Drop a skill file and the agent treats it as an executable behavior package. Skills auto-match prompts and inject tools at runtime.

Learn More→

Bulletproof Bedrock Pairing

Every toolUse gets a paired toolResult — even on errors, hallucinated tools, or planner output. Multi-iteration Bedrock loops just work.

Learn More→

Bash & Code Execution

Full terminal support with sandboxed subprocess execution. Agents can run code, install packages, run tests, and create git commits.

Learn More→

Autopilot runtime

Long-running, goal-driven, budget-gated. Runs until every success criterion is met or a budget trips. Atomic checkpoint every iteration; `autopilot.resume(run_id)` picks up after a crash.

Learn More→

Reflection critic

Second-opinion reviewer scores every iteration against the goal criteria and feeds suggestions back into the next iteration. Confidence-gated early termination on confident agreement.

Learn More→

Artifact collector

Claude-Desktop-style deliverables. Code fences + markdown docs auto-extracted from every iteration; tools push explicit artifacts via result metadata. Optional disk persistence for CI.

Learn More→

Parallel fan-out

autopilot.fanout(items, template) dispatches N child Autopilots concurrently. Each child gets a budget-scaled slice so aggregate spend stays bounded. Per-child streaming.

Learn More→

Scheduler daemon

Persistent JSON goal queue drained tick-by-tick. Crash-safe, SIGINT/SIGTERM-clean. systemd / launchd / Docker recipes in the docs turn any machine into a 24-hour Autopilot host.

Learn More→

56+ role specialists

Prebuilt AgentDefinitions — architects, reviewers, debuggers, designers, PMs, sales reps, account execs, recruiters, finance analysts, support agents. Drop a specialist into an Agent or hand its prompt to an Autopilot.

Learn More→

Ready to deploy

Start shipping
today.

Whether you're exploring deep architectures, executing multi-step workflows, or integrating custom tools — Shipit is your autonomous Python engineer.

Get Started Free

$pip install 'shipit-agent[all]'

Build agentsthat think deeply.

One agent, every sector.

Every Sector

MCP Catalog

Real Deliverables

Clean Logs

Scheduled Jobs

See it run. Every pattern, live.

Five flagship features that beat the competition.

Structured output with auto-retry

Verifier network

Episodic memory consolidation

Time-travel replay

ComputerUseAgent

Run for 24 hours. For every role.

From sales to design to ops. One framework.

Code Reviewer Bot

Release Engineer

Figma Designer

Sales Rep

Account Executive

Sales Ops

Technical Recruiter

Finance Analyst

Support Agent

Load any persona by name. One line.

Twelve new tools. Every role benefits.

One credential store. Every SaaS your agents need.

Bring your model. Or your proxy.

Cognitive architectures out of the box.

GoalAgent

Supervisor

ReflectiveAgent

AdaptiveAgent

PersistentAgent

AgentBenchmark

Zero-hallucination context injection.

40 agents. One line of code.

Security

Code Quality

DevOps

Architecture

Testing

Research

Planning

Content

Multi-agent crews thatcollaborate.

Domain knowledge as executable code.

Full-Stack Developer

Security Engineer

Database Architect

Web Scraper Pro

DevOps Automation

Skills + Agent

Skills + DeepAgent

Deterministic logic meets dynamic reasoning.

Watch the thought process unfold.

Three memory types, one line of code.

Conversation Memory

Semantic Search

Entity Tracking

Agent + Memory

Memory + RAG

Agents that report back.

Slack

Discord

Telegram

Multi-Channel

Event Feed

Clean runtime. Observable execution.

Zero Core Dependencies

Real-time Event Streaming

Native MCP Integration

Pydantic Structured Output

How we compare.

Complete Toolkit

25+ Built-in Tools

9 SaaS Connectors

100% Local & Secure

Native MCP Integration

One agent,
every sector.

Five flagship features
that beat the competition.

Load any persona
by name. One line.

Cognitive architectures
out of the box.

Zero-hallucination
context injection.

40 agents.
One line of code.

Multi-agent crews that
collaborate.

Domain knowledge
as executable code.

Deterministic logic meets
dynamic reasoning.

Three memory types,
one line of code.

Agents that
report back.

Clean runtime.
Observable execution.

Start shipping
today.