Long-running agents.
Stream every step.
Autopilot turns any agent into a budget-gated, checkpointed, goal-driven worker. Fan out across batches, reflect with a critic, collect artifacts, resume after crashes. Claude-Desktop-style autonomy — in Python.
Seven stages. One long-running brain.
Autopilot wraps the Agent in a loop that keeps working until every success criterion is satisfied — or a budget trips. Every stage is replaceable; every event streams live.
Everything you need to run overnight.
Each feature is a small, composable primitive. Use one. Use all ten. Turn an agent into a long-running worker with four lines of code — and cut your Bedrock bill in half while you're at it.
Autopilot runtime
Runs until every success criterion is met OR a budget trips. Wall-clock, tool-calls, tokens, dollars, iterations — independently honored. Atomic checkpoint per iteration.
Reflection critic
Scores every iteration's output against the criteria. Feeds suggestions back. Confidence-gated early termination — no more burning budget on already-satisfied goals.
Artifacts
Code fences and markdown docs auto-extracted every iteration. Tools push explicit deliverables via result metadata. Optional disk persistence.
Parallel fan-out
autopilot.fanout(items, template) dispatches N child Autopilots concurrently. Each child gets a budget-scaled slice so aggregate spend stays bounded.
Scheduler daemon
A persistent goal queue drained tick-by-tick. Crash-safe. systemd / launchd / Docker recipes in the docs — turn any machine into an always-on host.
47 role specialists
Dev · Debug · Design · PM · Sales · CS · Marketing + 40 architect / reviewer / security / devops. Every one now ships with run_code + ask_user_async.
CostRouter
Classify each turn easy/medium/hard, route to the cheapest adequate model. Drop-in LLM adapter. Typical 24h savings 50–70%.
Non-blocking ask_user
Halt cleanly into `awaiting_user`, side-channel the question to disk, resume when `shipit answer <run_id> …` arrives.
Vision on computer_use
Every screenshot carries base64 PNG bytes + media_type — a vision-capable LLM actually reasons over the pixels instead of reading a file path.
Docker sandbox
sandbox=True on run_code spins an ephemeral container, --network none, read-only rootfs, writable /tmp. Graceful fallback when docker isn't on PATH.
Five snippets. Five things you can ship today.
Run a goal under a budget — start here.
from shipit_agent import Autopilot, BudgetPolicy, Goalfrom examples.run_multi_tool_agent import build_llm_from_envllm = build_llm_from_env("bedrock") # defaults to Llama 4 Scoutautopilot = Autopilot(llm=llm,goal=Goal(objective="Explain the Python GIL with a runnable snippet.",success_criteria=["Two paragraphs of prose","A Python snippet showing GIL behavior",],),budget=BudgetPolicy(max_seconds=300, max_iterations=5),)result = autopilot.run(run_id="gil-explainer")print(result.status, result.iterations)print(result.output)
Docs, tools, and runnable notebooks.
Autopilot guides
New in 1.0.6 tools
Runnable notebooks
Ready to run overnight?
Install shipit-agent, pick a goal, set a budget. The runtime takes care of streaming, checkpoints, reflection, artifacts, fan-out, and resume.