v1.0.7 · Autopilot · Agents for every roleNEW

Long-running agents.
Stream every step.

Autopilot turns any agent into a budget-gated, checkpointed, goal-driven worker. Fan out across batches, reflect with a critic, collect artifacts, resume after crashes. Claude-Desktop-style autonomy — in Python.

1190 tests56 specialists53 notebooks
shipit autopilot "migrate SQL" --format tui
elapsed
0.0s
tools
0
tokens
0.0k
criteria
0/3
run_startedGoal: migrate SQL calls · 3 criteria0.0s
The loop

Seven stages. One long-running brain.

Autopilot wraps the Agent in a loop that keeps working until every success criterion is satisfied — or a budget trips. Every stage is replaceable; every event streams live.

1
Goal
objective + criteria
2
Iteration
tool calls · tokens
3
Critic
confidence-gated review
4
Artifacts
code · markdown · files
5
Fan-out
N children in parallel
6
Daemon
24h scheduler · resume
7
Specialists
56 built-in roles
1190 tests passing0 regressions56 specialists53 notebooks12 new tools in 1.0.7
Ten primitives · v1.0.6

Everything you need to run overnight.

Each feature is a small, composable primitive. Use one. Use all ten. Turn an agent into a long-running worker with four lines of code — and cut your Bedrock bill in half while you're at it.

Goal-driven · Budget-gated

Autopilot runtime

Runs until every success criterion is met OR a budget trips. Wall-clock, tool-calls, tokens, dollars, iterations — independently honored. Atomic checkpoint per iteration.

seconds
0%
tool_calls
0%
tokens
0%
iterations
0%
Read the guide
Second-opinion reviewer

Reflection critic

Scores every iteration's output against the criteria. Feeds suggestions back. Confidence-gated early termination — no more burning budget on already-satisfied goals.

confidence0.42keep iterating
gate: 0.75 · critic returns `[True, True]` when ≥ gate
Read the guide
Claude-Desktop-style deliverables

Artifacts

Code fences and markdown docs auto-extracted every iteration. Tools push explicit deliverables via result metadata. Optional disk persistence.

(no artifacts yet — run starting…)
Read the guide
N goals in parallel

Parallel fan-out

autopilot.fanout(items, template) dispatches N child Autopilots concurrently. Each child gets a budget-scaled slice so aggregate spend stays bounded.

PR-1010%
PR-1020%
PR-1030%
PR-1040%
PR-1050%
PR-1060%
Read the guide
24-hour operation

Scheduler daemon

A persistent goal queue drained tick-by-tick. Crash-safe. systemd / launchd / Docker recipes in the docs — turn any machine into an always-on host.

~/.shipit_agent/autopilot-queue.json · tick 0
nightly-reviewpending
morning-statuspending
hourly-lintpending
Read the guide
Prebuilt personas

47 role specialists

Dev · Debug · Design · PM · Sales · CS · Marketing + 40 architect / reviewer / security / devops. Every one now ships with run_code + ask_user_async.

generalist-developerdebuggerdesign-reviewerproduct-managersales-outreachcustomer-successmarketing-writer
Read the guide
Tiered LLM routing

CostRouter

Classify each turn easy/medium/hard, route to the cheapest adequate model. Drop-in LLM adapter. Typical 24h savings 50–70%.

easy
0
medium
0
hard
0
est. savings0%
Read the guide
Mid-run clarifications

Non-blocking ask_user

Halt cleanly into `awaiting_user`, side-channel the question to disk, resume when `shipit answer <run_id> …` arrives.

channel~/.shipit_agent/askuser/…
which cloud provider?
PENDING · run halts into awaiting_user
Read the guide
Screenshots that can be seen

Vision on computer_use

Every screenshot carries base64 PNG bytes + media_type — a vision-capable LLM actually reasons over the pixels instead of reading a file path.

captured screen
media_type: image/pngimage_base64: iVBORw0KGg…
Read the guide
Safe untrusted code

Docker sandbox

sandbox=True on run_code spins an ephemeral container, --network none, read-only rootfs, writable /tmp. Graceful fallback when docker isn't on PATH.

python:3.11-slimephemeral
--network bridgenone
rootfs--read-only
/work:ro
sandbox=True · exit 0 · stderr empty
Read the guide
How to use

Five snippets. Five things you can ship today.

Run a goal under a budget — start here.

$ pip install shipit-agent && export AWS_REGION=us-east-1
from shipit_agent import Autopilot, BudgetPolicy, Goal
from examples.run_multi_tool_agent import build_llm_from_env
 
llm = build_llm_from_env("bedrock") # defaults to Llama 4 Scout
 
autopilot = Autopilot(
llm=llm,
goal=Goal(
objective="Explain the Python GIL with a runnable snippet.",
success_criteria=[
"Two paragraphs of prose",
"A Python snippet showing GIL behavior",
],
),
budget=BudgetPolicy(max_seconds=300, max_iterations=5),
)
result = autopilot.run(run_id="gil-explainer")
print(result.status, result.iterations)
print(result.output)
Full guide

Ready to run overnight?

Install shipit-agent, pick a goal, set a budget. The runtime takes care of streaming, checkpoints, reflection, artifacts, fan-out, and resume.

v1.0.6805 tests9 LLM providers47 specialists8 runnable notebooks