v1.0.7 · Autopilot · Agents for every roleNEW

Long-running agents.
Stream every step.

Name: SHIPIT Agent
Author: SHIPIT

Autopilot turns any agent into a budget-gated, checkpointed, goal-driven worker. Fan out across batches, reflect with a critic, collect artifacts, resume after crashes. Claude-Desktop-style autonomy — in Python.

Read the guide Explore features

1190 tests56 specialists53 notebooks

shipit autopilot "migrate SQL" --format tui

elapsed

0.0s

tools

tokens

0.0k

criteria

0/3

run_startedGoal: migrate SQL calls · 3 criteria0.0s

The loop

Seven stages. One long-running brain.

Autopilot wraps the Agent in a loop that keeps working until every success criterion is satisfied — or a budget trips. Every stage is replaceable; every event streams live.

Goal

objective + criteria

Iteration

tool calls · tokens

Critic

confidence-gated review

Artifacts

code · markdown · files

Fan-out

N children in parallel

Daemon

24h scheduler · resume

Specialists

56 built-in roles

1190 tests passing0 regressions56 specialists53 notebooks12 new tools in 1.0.7

Ten primitives · v1.0.6

Everything you need to run overnight.

Each feature is a small, composable primitive. Use one. Use all ten. Turn an agent into a long-running worker with four lines of code — and cut your Bedrock bill in half while you're at it.

Goal-driven · Budget-gated

Autopilot runtime

Runs until every success criterion is met OR a budget trips. Wall-clock, tool-calls, tokens, dollars, iterations — independently honored. Atomic checkpoint per iteration.

seconds

tool_calls

tokens

iterations

Read the guide

Second-opinion reviewer

Reflection critic

Scores every iteration's output against the criteria. Feeds suggestions back. Confidence-gated early termination — no more burning budget on already-satisfied goals.

confidence0.42keep iterating

gate: 0.75 · critic returns `[True, True]` when ≥ gate

Read the guide

Claude-Desktop-style deliverables

Artifacts

Code fences and markdown docs auto-extracted every iteration. Tools push explicit deliverables via result metadata. Optional disk persistence.

(no artifacts yet — run starting…)

Read the guide

N goals in parallel

Parallel fan-out

autopilot.fanout(items, template) dispatches N child Autopilots concurrently. Each child gets a budget-scaled slice so aggregate spend stays bounded.

PR-1010%

PR-1020%

PR-1030%

PR-1040%

PR-1050%

PR-1060%

Read the guide

24-hour operation

Scheduler daemon

A persistent goal queue drained tick-by-tick. Crash-safe. systemd / launchd / Docker recipes in the docs — turn any machine into an always-on host.

~/.shipit_agent/autopilot-queue.json · tick 0

nightly-reviewpending

morning-statuspending

hourly-lintpending

Read the guide

Prebuilt personas

47 role specialists

Dev · Debug · Design · PM · Sales · CS · Marketing + 40 architect / reviewer / security / devops. Every one now ships with run_code + ask_user_async.

generalist-developerdebuggerdesign-reviewerproduct-managersales-outreachcustomer-successmarketing-writer

Read the guide

Tiered LLM routing

CostRouter

Classify each turn easy/medium/hard, route to the cheapest adequate model. Drop-in LLM adapter. Typical 24h savings 50–70%.

easy

medium

hard

est. savings0%

Read the guide

Mid-run clarifications

Non-blocking ask_user

Halt cleanly into `awaiting_user`, side-channel the question to disk, resume when `shipit answer <run_id> …` arrives.

channel~/.shipit_agent/askuser/…

which cloud provider?

PENDING · run halts into awaiting_user

Read the guide

Screenshots that can be seen

Vision on computer_use

Every screenshot carries base64 PNG bytes + media_type — a vision-capable LLM actually reasons over the pixels instead of reading a file path.

captured screen

media_type: image/pngimage_base64: iVBORw0KGg…

Read the guide

Safe untrusted code

Docker sandbox

sandbox=True on run_code spins an ephemeral container, --network none, read-only rootfs, writable /tmp. Graceful fallback when docker isn't on PATH.

python:3.11-slimephemeral

--network bridgenone

rootfs--read-only

/work:ro

sandbox=True · exit 0 · stderr empty

Read the guide

How to use

Five snippets. Five things you can ship today.

Run a goal under a budget — start here.

$ pip install shipit-agent && export AWS_REGION=us-east-1

from shipit_agent import Autopilot, BudgetPolicy, Goal
from examples.run_multi_tool_agent import build_llm_from_env
 
llm = build_llm_from_env("bedrock")     # defaults to Llama 4 Scout
 
autopilot = Autopilot(
    llm=llm,
    goal=Goal(
        objective="Explain the Python GIL with a runnable snippet.",
        success_criteria=[
            "Two paragraphs of prose",
            "A Python snippet showing GIL behavior",
        ],
    ),
    budget=BudgetPolicy(max_seconds=300, max_iterations=5),
)
result = autopilot.run(run_id="gil-explainer")
print(result.status, result.iterations)
print(result.output)

Full guide

Reference

Docs, tools, and runnable notebooks.

Open the docs

Ready to run overnight?

Install shipit-agent, pick a goal, set a budget. The runtime takes care of streaming, checkpoints, reflection, artifacts, fan-out, and resume.

pip install shipit-agent Read the Autopilot guide GitHub

v1.0.6805 tests9 LLM providers47 specialists8 runnable notebooks

Long-running agents.Stream every step.

Seven stages. One long-running brain.

Everything you need to run overnight.

Autopilot runtime

Reflection critic

Artifacts

Parallel fan-out

Scheduler daemon

47 role specialists

CostRouter

Non-blocking ask_user

Vision on computer_use

Docker sandbox

Five snippets. Five things you can ship today.

Run a goal under a budget — start here.

Docs, tools, and runnable notebooks.

Autopilot guides

New in 1.0.6 tools

Runnable notebooks

Ready to run overnight?

Long-running agents.
Stream every step.