Architecture

Name: SHIPIT Agent
Author: SHIPIT

How shipit_agent is built — runtime loop, layers, key invariants, and how every subsystem (RAG, deep agents, sessions, MCP) plugs in.

5 min read

20 sections

Edit this page

shipit_agent is built around a small, focused runtime with clean boundaries between concerns. There are no chains, no graphs, no mandatory inheritance hierarchies — just a runtime that executes LLM calls and tool calls with strong invariants on streaming, error recovery, and tool/result pairing.

Everything in this page is the result of reading the actual source — shipit_agent/runtime.py is one file you can hold in your head.

The big picture

bash

┌──────────────────────────────────────────────────┐
                    │                  user code                       │
                    │     Agent / DeepAgent / GoalAgent / ...          │
                    └──────────────────────┬───────────────────────────┘
                                           │
                       ┌───────────────────┼───────────────────┐
                       │                   │                   │
                       ▼                   ▼                   ▼
                ┌──────────┐         ┌───────────┐       ┌──────────────┐
                │ DeepAgent│         │ AgentChat │       │ shipit chat  │
                │ (factory)│         │  Session  │       │     CLI      │
                └────┬─────┘         └─────┬─────┘       └──────┬───────┘
                     │                     │                    │
                     ▼                     ▼                    ▼
                ┌──────────────────────────────────────────────────┐
                │                       Agent                      │
                │  llm · tools · mcps · prompt · policies · stores │
                │  rag · memory · session · trace · credentials    │
                └──────────────────────┬───────────────────────────┘
                                       │
                          ┌────────────▼────────────┐
                          │      AgentRuntime       │
                          │   run() / stream()      │
                          └────────────┬────────────┘
                                       │
        ┌───────────────┬──────────────┼──────────────┬──────────────┐
        ▼               ▼              ▼              ▼              ▼
  ┌─────────┐    ┌────────────┐  ┌──────────┐   ┌──────────┐   ┌──────────┐
  │   LLM   │    │    Tool    │  │   MCP    │   │   RAG    │   │  Stores  │
  │ Adapter │    │  Registry  │  │ Servers  │   │ subsystem│   │ session/ │
  └─────────┘    └─────┬──────┘  └────┬─────┘   └─────┬────┘   │  memory/ │
                       │              │               │        │  trace   │
                       ▼              ▼               ▼        └──────────┘
                 ┌──────────┐    ┌──────────┐  ┌────────────┐
                 │  Builtin │    │ Transport│  │ Vector +   │
                 │   Tools  │    │  layer   │  │ Keyword +  │
                 │   (30+)  │    │          │  │ Reranker   │
                 └──────────┘    └──────────┘  └────────────┘

The runtime loop

The heart of the library is AgentRuntime.run (and its streaming counterpart AgentRuntime.stream). Pseudo-code:

python

def run(user_prompt):
    state = RuntimeState()
    load_session()
    if rag is not None: rag.begin_run()           # source tracking starts here
    emit("run_started")

    if router_policy.auto_plan:
        run_planner()
        emit("planning_completed")  # injected as user-role message

    for iteration in range(1, max_iterations + 1):
        emit("step_started")
        compacted = maybe_compact_messages(state.messages, context_window_tokens)
        response = llm.complete(messages=compacted, tools=tool_schemas)
        track_usage(response)

        if response.reasoning_content:
            emit("reasoning_started")
            emit("reasoning_completed")

        if not response.tool_calls:
            break

        append_assistant_message_with_tool_uses(response)
        for tool_call in response.tool_calls:
            emit("tool_called")
            try:
                result = run_tool(tool_call)            # may use parallel pool
                append_tool_result_message(result)
                emit("tool_completed")
            except Exception as exc:
                append_synthetic_tool_error_message(exc)
                emit("tool_failed")

    if hit_iteration_cap:
        # one final summarisation turn with tools=[] so the answer
        # is never empty.
        response = llm.complete(tools=[])

    save_session()
    save_memory()
    if rag is not None:
        sources = rag.end_run()
        emit("rag_sources", sources=sources)
    emit("run_completed")
    return state, response

Real source: shipit_agent/runtime.py — readable end-to-end in a sitting.

Key invariants

These guarantees are what make the runtime predictable across providers and across long, multi-step runs.

1. Tool use/result pairing

Every toolUse block in an assistant turn is matched by exactly one toolResult block in the next user turn. This is enforced unconditionally:

Outcome	Result message appended
Tool succeeds	Real `ToolOutput` content
Tool raises a retryable error	Retry loop, then real or synthetic error
Tool raises non-retryable error	Synthetic `"Error: …"` message
Model hallucinates an unknown tool name	Synthetic `"Error: tool X is not registered"`
Planner runs	Output injected as `role="user"` context — never as `role="tool"`

This is why Bedrock's strict Converse API works reliably across multi-iteration tool loops.

2. Reasoning extraction

The LLM adapter populates LLMResponse.reasoning_content from whatever shape the provider returns (OpenAI o-series, Anthropic extended thinking, Bedrock gpt-oss, DeepSeek R1, …). The runtime emits reasoning_started / reasoning_completed events automatically — no configuration required.

3. Events are immutable, ordered, and incremental

Every event is a frozen AgentEvent dataclass. The stream is strictly ordered: events are yielded in emission order with no reordering, no deduplication, no batching.

stream() runs the runtime on a background daemon thread and yields events via a queue.Queue so each event arrives the instant it's emitted. Worker exceptions are re-raised on the consumer side.

4. Tool/result pairing extends to parallel execution

When parallel_tool_execution=True, the runtime fans tool calls out across a ThreadPoolExecutor. Results are collected and appended in the same order as the original tool_calls list, so pairing is still guaranteed.

5. RAG source tracking is per-run and thread-local

Agent.run calls rag.begin_run() at the top and rag.end_run() at the bottom. The tracker uses thread-local state so concurrent runs on different threads don't bleed citations into each other. The captured RAGSource list is attached to result.rag_sources and (in streaming mode) emitted as a final rag_sources event.

6. The runtime is the only thing that talks to the LLM

Every other subsystem (RAG, deep agents, hooks, sessions, …) goes through the runtime. There is no second code path. This is what keeps the public surface coherent.

Layered composition

Agent types stack on top of each other. Reading bottom-up:

Layer	Class	What it adds
1	`LLM` adapter	Provider-specific request shaping + reasoning extraction
2	`AgentRuntime`	The loop above — events, pairing, retries, compaction
3	`Agent`	High-level facade — tools, RAG, memory, sessions, hooks
4	`AgentChatSession`	Multi-turn chat over a single agent + session store
5	`GoalAgent` / `ReflectiveAgent` / `AdaptiveAgent` / `Supervisor` / `PersistentAgent`	Specialised behaviours that wrap an inner `Agent`
6	`DeepAgent`	Power-user factory — bundles seven deep tools, an opinionated prompt, and one-flag access to verification, reflection, goal mode, sub-agents
7	`shipit chat` REPL	Live multi-agent terminal CLI on top of any of the above

Each layer is independent: you can drop straight in at layer 3 (Agent) without ever touching layer 6, or you can chain the layers arbitrarily — DeepAgent.with_builtins(agents=[GoalAgent(...), DeepAgent(...)]) is a perfectly valid topology.

Subsystem snapshot

Tools

Tool is a 4-method protocol (name, description, schema, run). The ToolRegistry looks up tools by name; ToolRunner executes a single call with a ToolContext (prompt, metadata, state, session id). Built-in tools live under shipit_agent/tools/. Tool creation at runtime is supported by AdaptiveAgent.

MCP

MCPServer wraps a transport (MCPSubprocessTransport, MCPHTTPTransport, PersistentMCPSubprocessTransport, RemoteMCPServer) and exposes its tools through the same Tool protocol. Discovery failures log a warning and continue — they don't crash the agent.

RAG (Super RAG)

shipit_agent.rag is a self-contained subsystem with its own VectorStore, KeywordStore, Embedder, and Reranker protocols. The RAG facade ties them together with a HybridSearchPipeline (vector + BM25 + RRF + optional rerank + context expansion). When you pass rag= to Agent, three tools are auto-wired and a per-run SourceTracker captures every retrieved chunk into AgentResult.rag_sources. Adapters in shipit_agent.rag.adapters (DRK_CACHE, Chroma, Qdrant, pgvector) plug into the same protocols. See Super RAG.

Stores

Three orthogonal store protocols, each with InMemory* and File* implementations:

Store	Stores	Used by
`SessionStore`	`SessionRecord` (messages + metadata)	`AgentChatSession`, multi-turn chat
`MemoryStore`	`MemoryFact` (timestamped knowledge)	Long-term memory
`TraceStore`	`TraceRecord` (event audit log)	Production observability
`CredentialStore`	`CredentialRecord` (OAuth tokens, API keys)	Connector tools

All four are dataclass-backed, JSON-serialisable, and pluggable.

Hooks

AgentHooks exposes before_llm, after_llm, before_tool, and after_tool. Use it for custom logging, redaction, instrumentation, and side-channels. Hooks fire on the runtime thread; long-running work should be deferred to a queue.

Policies

Policy	Purpose
`RetryPolicy`	Per-LLM-call and per-tool-call retry config
`RouterPolicy`	`auto_plan`, `use_tool_search`, `tool_search_top_k`

Module layout