shipit-agent v1.0.8 · launch archive

Five flagship features
that beat the competition.

Name: SHIPIT Agent
Author: SHIPIT

Validation retry that stays in your conversation. Process supervision in one constructor argument. ChatGPT-style memory, principled. Replay.io for AI agents. A self-hostable Devin. All five reachable from from shipit_agent import …

Walk through every feature Changelog 5 launch notebooks ↗

Looking for the latest? See what's new in v1.0.12 →

new modules

+318

new tests

1508

tests passing

docs pages

notebooks

regressions

Walk through every feature

Each feature has its own dedicated docs page, notebook, and ≥10 unit tests per public method. Click through for the full reference.

Feature 01 · Beats LangChain

Structured output with auto-retry

Pass a Pydantic model or JSON Schema. Get a typed result. On parse failure, the runtime retries inside the same conversation — no separate fixing LLM.

Why this beats the competition

LangChain's OutputFixingParser runs a fresh LLM call. Ours stays in the same conversation, so the model has full context of what it tried.

Read the full docs

structured-output.py

from pydantic import BaseModel
from shipit_agent import Agent
 
class Movie(BaseModel):
    title: str
    rating: float
 
result = agent.run(
    "Recommend a thriller.",
    output_schema=Movie,
    max_validation_retries=2,
)
print(result.parsed)  # Movie(title='Heat', rating=8.5)

Feature 02 · Process supervision in 1 line

Verifier network

A second cheap LLM vetoes hallucinated tool calls before they fire. Per-iteration progress check nudges stalling agents back on track.

Why this beats the competition

LangGraph's ToolNode has no per-call gating. LangChain's RunnableWithMessageHistory has no progress detector.

Read the full docs

verifier.py

from shipit_agent import Agent
from shipit_agent.verifier import VerifierNetwork
 
verifier = VerifierNetwork(
    llm=haiku_llm,                 # cheap verifier
    goal="Audit security of merged PRs",
)
 
agent = Agent(
    llm=opus_llm,
    tools=[GitHubTool(), GrepTool()],
    verifier=verifier,             # auto-wraps every tool
)
result = agent.run("Run the audit.")

Feature 03 · Better than ChatGPT Memories

Episodic memory consolidation

Distill conversations into durable facts. Forgetting curve so old facts decay. Frequently-retrieved facts promote to core memory in the system prompt.

Why this beats the competition

ChatGPT's Memories has no decay, no retrieval-based promotion, no auto-extraction. Ours has all three. Self-hostable.

Read the full docs

memory.py

from shipit_agent import MemoryConsolidator
 
c = MemoryConsolidator(llm=cheap_llm)
 
# After every 10 turns
c.consolidate(memory=mem,
              recent_messages=mem.get_conversation_messages())
 
# Once a day
c.decay(mem.knowledge, half_life_days=14)
 
# Every turn — top-K facts → system prompt
core = c.core_memory(mem.knowledge, top_k=5)

Feature 04 · Replay.io for AI agents

Time-travel replay

Load any saved trace. Fork from any event. Edit the prompt. Resume on a fresh agent. Side-by-side diff of two runs.

Why this beats the competition

LangSmith Playground and Inngest branching are SaaS-only. Ours is library-level, open-source, works against your existing FileTraceStore.

Read the full docs

replay.py

from shipit_agent.replay import TraceReplayer
 
replayer = TraceReplayer.from_store(store, "run-abc")
 
# Fork at the bad iteration with a tweaked prompt
fork = replayer.fork(
    at_event=12,
    edit_user_message="Try a narrower question.",
)
 
# Resume on a fresh agent
result = fork.continue_from(agent=Agent(llm=opus_llm))

Feature 05 · Self-hosted Devin / Operator

ComputerUseAgent

Drive a browser by showing screenshots to a vision-capable LLM. Anthropic native computer-use + plain-text fallback for any vision LLM.

Why this beats the competition

Devin and OpenAI Operator are SaaS products. Ours is a library — self-host, plug into your own loop, fork the implementation. Mock browser for tests, Playwright for production.

Read the full docs

computer-use.py

from shipit_agent.computer_use import (
    ComputerUseAgent, PlaywrightBrowserSession,
)
 
with PlaywrightBrowserSession.launch(headless=True) as browser:
    agent = ComputerUseAgent(
        llm=opus_llm,
        browser=browser,
        goal="Find iPhone 15 Pro price on apple.com.",
        max_iterations=10,
    )
    result = agent.run()
    print(result.final_text)

Side-by-side

How every flagship v1.0.8 feature stacks up against the closest equivalents in LangChain, LangGraph, and ChatGPT.

Capability	LangChain	LangGraph	ChatGPT / Operator	shipit_agent v1.0.8
Same-conversation validation retry	separate LLM call	no	no	yes
Streaming partial JSON	no	no	frontend only	yes
Per-tool-call veto by separate verifier	no	no	no	yes
Progress detector + auto-nudge	no	no	no	yes
Memory consolidation with forgetting curve	no	no	binary on/off	yes
Time-travel replay (fork from any event)	no	no	no	yes
Self-host browser-driving agent	no	no	Operator (SaaS)	yes
Library-level API (not SaaS)	yes	yes	no	yes

Ship production agents.
On your own infra.

Five flagship features. 1508 tests. 0 regressions. Every line of code is yours to read, fork, and run on the LLM of your choice.

Install shipit-agent Read the agent docs Star on GitHub

pip install 'shipit-agent[anthropic,playwright]'

Five flagship featuresthat beat the competition.

Walk through every feature

Structured output with auto-retry

Verifier network

Episodic memory consolidation

Time-travel replay

ComputerUseAgent

Side-by-side

Ship production agents.On your own infra.

Five flagship features
that beat the competition.

Ship production agents.
On your own infra.