Build agentsthat think deeply.
Claude API power: server-side tools, document citations, the Batch API, and interleaved thinking — plus cross-provider prompt caching, on top of the v1.0.11 control plane.
Bring your own LLM, attach remote MCP servers, stream native thinking.
$pip install 'shipit-agent[all]'See it run. Every pattern, live.
Six lines of Python, six ways to ship. Tap a tab and watch the agent actually work — streaming tokens, calling tools, reasoning in steps.
Five flagship features
that beat the competition.
Validation retry that stays in your conversation. Process supervision in a single constructor argument. ChatGPT-style memory, principled. Replay.io for AI agents. A self-hostable Devin. All five reachable from from shipit_agent import ….
from pydantic import BaseModelfrom shipit_agent import Agentclass Movie(BaseModel):title: strrating: floatresult = agent.run("Recommend a thriller.",output_schema=Movie,max_validation_retries=2,)print(result.parsed) # Movie(title='Heat', rating=8.5)
LangChain's OutputFixingParser runs a fresh LLM call. Ours stays in the same conversation, so the model has full context of what it tried.
Run for 24 hours. For every role.
Autopilot turns any agent — developer, designer, sales rep, analyst, support — into a budget-gated, checkpointed worker. Cumulative budgets across resume · SIGTERM-safe · dollar tracking · dashboards as artifacts · your own LiteLLM proxy in three fields. Fan out, reflect with a critic, stream every step.
From sales to design to ops. One framework.
shipit-agent isn't just for developers anymore. Nine new role specialists ship with the framework — plug any of them into an Agent or an Autopilot and watch them pick up the tools they need.
Code Reviewer Bot
Reviews PRs, cites lines, insists on tests.
Release Engineer
Verifies CI, tags, ships, posts announcement.
Figma Designer
Reviews designs, extracts tokens, writes handoffs.
Sales Rep
Enriches leads, drafts outreach, logs to Salesforce.
Account Executive
Pipeline reviews, account health, forecasts.
Sales Ops
Data-quality audits, pipeline dashboards.
Technical Recruiter
Targeted sourcing, short outreach, candidate tracking.
Finance Analyst
Month-end close, cash flow, contract reviews.
Support Agent
Triages tickets, reads screenshots, flags incidents.
Load any persona
by name. One line.
56 prebuilt specialists across code, design, sales, marketing, finance, support. Each one is a profile — name, system prompt, suggested tools, iteration budget. Hand it to your Agent and ship.
from shipit_agent.agents import AgentRegistryregistry = AgentRegistry.default()spec = registry.get("code-reviewer-bot")
from shipit_agent import Agentfrom shipit_agent.agents import AgentRegistryspec = AgentRegistry.default().get("code-reviewer-bot")agent = Agent.with_builtins(llm=opus_llm,prompt=spec.system_prompt(),max_iterations=spec.max_iterations,)result = agent.run("Review pull request shipiit/shipit_agent#142. ""Comment inline on any issues.")
PR review automation — inline comments on diffs, security flags, style nits.
Twelve new tools. Every role benefits.
Six core tools every agent can use — plus six SaaS integrations for the personas that need them. All opt-in. All discoverable via tool_search.
PRs, issues, branches, reviews.
Merge requests, pipelines, releases.
SQLAlchemy, multi-dialect queries.
See screenshots, mockups, charts.
Extract text and tables from docs.
LangSmith + OpenTelemetry exporters.
Read frames, tokens, comments.
Leads, opps, accounts, pipeline.
Charges, subs, invoices, payouts.
Read, append, format ranges.
Tickets, users, organizations.
Read-only people + company search.
One credential store. Every SaaS your agents need.
Register each service once with a CredentialRecord and every agent that loads the matching tool picks it up automatically. OAuth, API tokens, Basic auth, custom headers — all handled.
Bring your model. Or your proxy.
Native SDKs for Anthropic, OpenAI, and AWS Bedrock. Every other provider routes through LiteLLM — 100+ models, zero adapter code. Run a self-hosted LiteLLM proxy? Plug every agent into it with three fields.
litellm --config as a central gateway? Point every Agent, Autopilot, ShipCrew at it with model + api_base + api_key. The proxy handles upstream credentials, rate-limiting, routing, cost tracking. Zero adapter code on your side.Cognitive architectures
out of the box.
When a single agent loop isn't enough, switch to a Deep Agent. Gain planning, sub-agent delegation, self-reflection, and runtime tool creation.
from shipit_agent.deep import GoalAgent, Goalagent = GoalAgent.with_builtins(llm=llm,goal=Goal(objective="Compare Python async libraries",success_criteria=["Speed benchmarks","Memory usage comparison","Cites data sources"]))result = agent.run()print(result.goal_status) # "completed"print(result.criteria_met) # [True, True, True]
Zero-hallucination
context injection.
A powerful, pluggable retrieval-augmented-generation subsystem built directly into the agent. Runs hybrid search (Vector + BM25 + Reciprocal Rank Fusion) out of the box with zero required dependencies.
Explore RAG Pipelineresult.rag_sources40 agents.
One line of code.
Production-ready agent personas across 8 categories. Load by name, point at your project, and run. Security audits, code reviews, architecture analysis — instant.
from shipit_agent import Agentfrom shipit_agent.agents import AgentRegistryregistry = AgentRegistry.default()agent_def = registry.get("security-auditor")agent = Agent.with_builtins(llm=llm,prompt=agent_def.system_prompt(),project_root="/path/to/my-project",)result = agent.run("Perform a full OWASP Top 10 audit")print(result.output)
Multi-agent crews that
collaborate.
DAG-based orchestration. Agents work in parallel, pass results downstream, and a coordinator ensures quality. Like OpenClaw, but built into your Python code.
Tasks execute in topological order. Each task's output flows to the next via template variables.
Independent tasks in the same DAG layer run concurrently. Merge at sync points automatically.
Coordinator LLM dynamically assigns tasks, reviews output, and can request revisions.
Domain knowledge
as executable code.
Skills are runtime behavior packages. They auto-match your intent and inject specialized tools on the fly.
Deterministic logic meets
dynamic reasoning.
Chain agents together like UNIX pipes. Use Pipeline.sequential() for strict step-by-step processing, or parallel() to fan-out sub-tasks concurrently.
Explore Agent TeamsWatch the thought process unfold.
Under the hood, Shipit runs on a background thread and pushes AgentEvent objects through a thread-safe queue. Every tool invocation, planning step, and raw reasoning block reaches your loop the instant it's emitted—no buffering.
View Streaming APIThree memory types,
one line of code.
Initialize with AgentMemory.default() and the agent handles conversation, semantic history, and entity tracking automatically.
mem = ConversationMemory(strategy="summary",summary_llm=llm,window_size=20)# Old messages → LLM summary# Recent 20 → kept verbatimmsgs = mem.get_messages()# [summary_msg, msg_81, ..., msg_100]
Agents that
report back.
Push status updates to Slack, Discord, and Telegram automatically. Budget alerts, tool failures, run completions — zero polling.
from shipit_agent import Agentfrom shipit_agent.notifications import SlackNotifier, NotificationManagerslack = SlackNotifier(webhook_url="https://hooks.slack.com/services/T.../B.../xxx",channel="#agent-alerts",username="ShipIt Agent",)manager = NotificationManager([slack])agent = Agent.with_builtins(llm=llm,hooks=manager.as_hooks("security-auditor"),)# Slack receives: run_started, tool events, run_completedresult = agent.run("Audit the auth module for vulnerabilities")
Clean runtime.
Observable execution.
We built Shipit to expose low-level control over its execution loop. Keep clean boundaries between your runtime, tools, policies, and profiles.
Zero Core Dependencies
Shipit keeps its footprint light. The base library requires only `pydantic`. Provider SDKs like `openai`, `anthropic`, or `litellm` are strictly opt-in extras.
Real-time Event Streaming
Watch the thought process unfold instantly. Every token, tool argument, reasoning block, and retry streams natively out of the agent loop.
How we compare.
SHIPIT Agent is a library, not a framework. Small, focused, and observable. Here's how it stacks up against the alternatives.
Complete Toolkit
Everything you need for autonomous engineering. No wrappers, no bloated abstractions — highly capable tools that plug directly into your workflow.
25+ Built-in Tools
web_search, open_url, bash, read_file, edit_file, write_file, run_code, plan_task, verify_output, sub_agent, and 15 more. All opt-in, all discoverable via tool_search.
Learn More→9 SaaS Connectors
Gmail, Google Drive, Slack, Linear, Jira, Notion, Confluence, GitHub, PostgreSQL. Each surfaces as agent tools — no wrapper code needed.
Learn More→100% Local & Secure
Your code stays on your machine. Shipit runs locally, isolates memory per project, and requires explicit permission for tool executions.
Learn More→Native MCP Integration
Attach any remote or local Model Context Protocol server. Give agents access to Linear, Slack, Postgres, or internal tooling with one line.
Learn More→Parallel Execution
When the LLM returns multiple tool calls, run them concurrently. Results stay in order. Typically 2-3x faster for multi-tool turns.
Learn More→Pydantic Structured Output
Define output schemas using Pydantic models. The agent returns strict, typed JSON — perfect for data pipelines and downstream systems.
Learn More→Extensible Markdown Skills
Drop a skill file and the agent treats it as an executable behavior package. Skills auto-match prompts and inject tools at runtime.
Learn More→Bulletproof Bedrock Pairing
Every toolUse gets a paired toolResult — even on errors, hallucinated tools, or planner output. Multi-iteration Bedrock loops just work.
Learn More→Bash & Code Execution
Full terminal support with sandboxed subprocess execution. Agents can run code, install packages, run tests, and create git commits.
Learn More→Autopilot runtime
Long-running, goal-driven, budget-gated. Runs until every success criterion is met or a budget trips. Atomic checkpoint every iteration; `autopilot.resume(run_id)` picks up after a crash.
Learn More→Reflection critic
Second-opinion reviewer scores every iteration against the goal criteria and feeds suggestions back into the next iteration. Confidence-gated early termination on confident agreement.
Learn More→Artifact collector
Claude-Desktop-style deliverables. Code fences + markdown docs auto-extracted from every iteration; tools push explicit artifacts via result metadata. Optional disk persistence for CI.
Learn More→Parallel fan-out
autopilot.fanout(items, template) dispatches N child Autopilots concurrently. Each child gets a budget-scaled slice so aggregate spend stays bounded. Per-child streaming.
Learn More→Scheduler daemon
Persistent JSON goal queue drained tick-by-tick. Crash-safe, SIGINT/SIGTERM-clean. systemd / launchd / Docker recipes in the docs turn any machine into a 24-hour Autopilot host.
Learn More→56+ role specialists
Prebuilt AgentDefinitions — architects, reviewers, debuggers, designers, PMs, sales reps, account execs, recruiters, finance analysts, support agents. Drop a specialist into an Agent or hand its prompt to an Autopilot.
Learn More→Start shipping
today.
Whether you're exploring deep architectures, executing multi-step workflows, or integrating custom tools — Shipit is your autonomous Python engineer.
$pip install 'shipit-agent[all]'