shipit-agent v1.0.8 · launch archive

Five flagship features
that beat the competition.

Validation retry that stays in your conversation. Process supervision in one constructor argument. ChatGPT-style memory, principled. Replay.io for AI agents. A self-hostable Devin. All five reachable from from shipit_agent import …

Looking for the latest? See what's new in v1.0.12 →

5
new modules
+318
new tests
1508
tests passing
5
docs pages
5
notebooks
0
regressions

Walk through every feature

Each feature has its own dedicated docs page, notebook, and ≥10 unit tests per public method. Click through for the full reference.

Feature 01 · Beats LangChain

Structured output with auto-retry

Pass a Pydantic model or JSON Schema. Get a typed result. On parse failure, the runtime retries inside the same conversation — no separate fixing LLM.

Why this beats the competition

LangChain's OutputFixingParser runs a fresh LLM call. Ours stays in the same conversation, so the model has full context of what it tried.

Read the full docs
structured-output.py
from pydantic import BaseModel
from shipit_agent import Agent
class Movie(BaseModel):
title: str
rating: float
result = agent.run(
"Recommend a thriller.",
output_schema=Movie,
max_validation_retries=2,
)
print(result.parsed) # Movie(title='Heat', rating=8.5)
Feature 02 · Process supervision in 1 line

Verifier network

A second cheap LLM vetoes hallucinated tool calls before they fire. Per-iteration progress check nudges stalling agents back on track.

Why this beats the competition

LangGraph's ToolNode has no per-call gating. LangChain's RunnableWithMessageHistory has no progress detector.

Read the full docs
verifier.py
from shipit_agent import Agent
from shipit_agent.verifier import VerifierNetwork
verifier = VerifierNetwork(
llm=haiku_llm, # cheap verifier
goal="Audit security of merged PRs",
)
agent = Agent(
llm=opus_llm,
tools=[GitHubTool(), GrepTool()],
verifier=verifier, # auto-wraps every tool
)
result = agent.run("Run the audit.")
Feature 03 · Better than ChatGPT Memories

Episodic memory consolidation

Distill conversations into durable facts. Forgetting curve so old facts decay. Frequently-retrieved facts promote to core memory in the system prompt.

Why this beats the competition

ChatGPT's Memories has no decay, no retrieval-based promotion, no auto-extraction. Ours has all three. Self-hostable.

Read the full docs
memory.py
from shipit_agent import MemoryConsolidator
c = MemoryConsolidator(llm=cheap_llm)
# After every 10 turns
c.consolidate(memory=mem,
recent_messages=mem.get_conversation_messages())
# Once a day
c.decay(mem.knowledge, half_life_days=14)
# Every turn — top-K facts → system prompt
core = c.core_memory(mem.knowledge, top_k=5)
Feature 04 · Replay.io for AI agents

Time-travel replay

Load any saved trace. Fork from any event. Edit the prompt. Resume on a fresh agent. Side-by-side diff of two runs.

Why this beats the competition

LangSmith Playground and Inngest branching are SaaS-only. Ours is library-level, open-source, works against your existing FileTraceStore.

Read the full docs
replay.py
from shipit_agent.replay import TraceReplayer
replayer = TraceReplayer.from_store(store, "run-abc")
# Fork at the bad iteration with a tweaked prompt
fork = replayer.fork(
at_event=12,
edit_user_message="Try a narrower question.",
)
# Resume on a fresh agent
result = fork.continue_from(agent=Agent(llm=opus_llm))
Feature 05 · Self-hosted Devin / Operator

ComputerUseAgent

Drive a browser by showing screenshots to a vision-capable LLM. Anthropic native computer-use + plain-text fallback for any vision LLM.

Why this beats the competition

Devin and OpenAI Operator are SaaS products. Ours is a library — self-host, plug into your own loop, fork the implementation. Mock browser for tests, Playwright for production.

Read the full docs
computer-use.py
from shipit_agent.computer_use import (
ComputerUseAgent, PlaywrightBrowserSession,
)
with PlaywrightBrowserSession.launch(headless=True) as browser:
agent = ComputerUseAgent(
llm=opus_llm,
browser=browser,
goal="Find iPhone 15 Pro price on apple.com.",
max_iterations=10,
)
result = agent.run()
print(result.final_text)

Side-by-side

How every flagship v1.0.8 feature stacks up against the closest equivalents in LangChain, LangGraph, and ChatGPT.

CapabilityLangChainLangGraphChatGPT / Operatorshipit_agent v1.0.8
Same-conversation validation retryseparate LLM call no no yes
Streaming partial JSON no nofrontend only yes
Per-tool-call veto by separate verifier no no no yes
Progress detector + auto-nudge no no no yes
Memory consolidation with forgetting curve no nobinary on/off yes
Time-travel replay (fork from any event) no no no yes
Self-host browser-driving agent no noOperator (SaaS) yes
Library-level API (not SaaS) yes yes no yes

Ship production agents.
On your own infra.

Five flagship features. 1508 tests. 0 regressions. Every line of code is yours to read, fork, and run on the LLM of your choice.

pip install 'shipit-agent[anthropic,playwright]'