Changelog
v1.0.4 — 2026-04-12
Skills, tools, and runtime power-up. All 32 tool prompts rewritten with decision trees and anti-patterns. Full skill-to-tool linking for all 37 packaged skills. Automatic iteration boost for skill-driven workflows. Expanded bash allowlist (50+ commands). Streaming, chat, and project-building examples across 3 notebooks. Comprehensive docstrings across every key module. 32 skill tests. All passing.
Skills — Full Tool Linking
- 37 skill tool bundles (up from 10) — every packaged skill now declares the built-in tools it needs. When a skill is selected, the agent auto-attaches the right tools.
- Shared tool groups (
_FILE_CORE,_CODE_CORE,_WEB_CORE) reduce duplication across bundles. validate_tool_bundles()— new helper that checks every tool name inSKILL_TOOL_BUNDLESagainst the real builtin map.
Agent — Iteration Boost & Efficiency
_effective_max_iterations()— auto-boosts 4 → 8 when skills inject extra tools so skill-driven workflows can complete without cutting off early.- Single skill computation —
run()andstream()now compute skills once and reuse (previously 3x per call).
Tool Prompts — All 32 Upgraded
Every tool's prompt.py rewritten with decision trees, anti-patterns, workflow guidance, and cross-tool coordination.
Bash Allowlist Expansion
- 50+ safe commands added:
mkdir,touch,cp,mv,echo,grep,curl,docker,kubectl,terraform,aws,go,cargo,npx,tsc,eslint,black,isort,tree,awk,cut,diff, and more.
Documentation
- Comprehensive docstrings on
agent.py,builtins.py,skills/loader.py,skills/registry.py,skills/tool_bundles.py,deep_agent/factory.py. - 6 tool doc pages updated with enhanced prompts.
- Skills guide expanded with 7 real-world examples, streaming sections, chat sessions, and event type reference.
- Notebook 27 rewritten (38 cells): streaming, chat streaming, project build, web scraping, DeepAgent chat.
- Notebook 29 (new): DeepAgent + skills + memory + verify + reflect + sub-agents + streaming.
- Notebook 30 (new): real-world full project build across 6 steps with 5 different skills.
Tests
- 15 new tests (17 → 32 total): iteration boost, bundle validation, chat sessions, streaming, chat streaming, memory + skills, DeepAgent chat/stream.
v1.0.3 — 2026-04-11
Major feature release. Super RAG subsystem, DeepAgent factory (verify / reflect / goal / sub-agents), live multi-agent chat REPL (shipit chat), Agent memory cookbook, plus deep docs + notebook coverage. 521 unit tests. 19 Bedrock end-to-end smoke tests. All passing.
Super RAG
shipit_agent.ragsubsystem — pluggable chunker + embedder + vector store + keyword store + hybrid pipeline (vector + BM25 + RRF + recency bias + rerank + context expansion).rag=on every agent type — auto-wiresrag_search/rag_fetch_chunk/rag_list_sourcestools, augments the system prompt with citation instructions, and attachesresult.rag_sourceswith stable[N]citation indices.- Adapters —
DrkCacheVectorStore(pgvector over psycopg2) + lazy Chroma / Qdrant / pgvector. - Thread-local per-run source tracker so concurrent runs never leak citations.
DeepAgent
shipit_agent.deep.DeepAgent— power-user factory bundling seven deep tools:plan_task,decompose_problem,workspace_files,sub_agent,synthesize_evidence,decision_matrix,verify_output. Guide- One-flag power features:
verify=True,reflect=True,goal=Goal(...),rag=RAG(...),memory=AgentMemory(...). agents=sub-agent delegation — plug any mix of agent types as named delegates via a built-indelegate_to_agenttool.create_deep_agent()functional helper — auto-wraps plain Python callables as tools.- Nested event streaming — sub-agent events surface inside
tool_completed.metadata['events'].
Live chat REPL
shipit chat— modern multi-agent terminal REPL. Switch agent types live, index files mid-session, save/load conversations, togglereflect/verify, inspect tools and sources. Guide- Rich slash commands:
/agent,/agents,/tools,/sources,/index,/rag,/goal,/reflect,/verify,/history,/save,/load,/reset,/info, … - Pluggable LLM provider via
--provider; persistent sessions via--session-dir.
Streaming
DeepAgent.stream()covers every execution mode (direct, verified, reflective, goal-driven, sub-agent delegation).PersistentAgent.stream()added with per-step checkpointing.rag_sourcesevent type added — emitted after every RAG-backed run.
Memory
- Dedicated Agent → Memory cookbook explaining the two memory systems (
memory_store=for the LLM'smemorytool vsAgentMemoryfor application-curated profiles). Guide - DeepAgent auto-hydration —
memory=AgentMemory(...)seeds the inner agent'shistoryfrom the conversation summary. - Notebook 26 — runnable end-to-end tour.
Docs
- New Agent section (6 pages): Overview, Examples, Streaming, With RAG, With Tools, Memory, Sessions.
- New Super RAG section (6 pages): Overview, Standalone, Files & Chunks, With Agent, With Deep Agents, Adapters, API.
- New DeepAgent page. Reference
- Parameters Reference — every constructor parameter for every agent type and key class. Reference
- Updated Architecture + Model Adapters reference pages.
- Updated quickstart with Agent / Deep Agent / RAG sections.
- Updated FAQ with "Agent types — which one should I use?".
- 5 new notebooks (22–26): RAG basics, RAG + Agent, RAG + Deep Agents, DeepAgent chat, Agent memory.
- Full-width docs layout + collapsible TOC with floating toggle, persistence via localStorage.
Build
shipit-chatscript entry point.- Granular extras:
rag,rag-openai,rag-cohere,rag-chroma,rag-qdrant,rag-pgvector,rag-drk-cache,rag-pdf,rag-docx,rag-rerank-cohere,rag-rerank-cross-encoder, plusbedrock,google,groq,together,ollama. Theallextra bundles everything.
Fixed
- Tool schema format bug —
RAGSearchTool,RAGFetchChunkTool,RAGListSourcesTool,WebhookPayloadToolnow use the wrapped{"type": "function", "function": {...}}shape. Previously they were returning flat dicts and Bedrock's Converse API was rejecting them with empty-name validation errors. New regression test scans every tool for Bedrock compatibility. memory=AgentMemorytype coercion —DeepAgentandGoalAgentno longer auto-assignAgentMemory.knowledge(aSemanticMemory) intomemory_store=(which expects aMemoryStore).memory=now only seedshistory; users passmemory_store=explicitly for the runtime'smemorytool.Agent.with_builtins(tools=[...])keyword collision — the method now accepts and merges usertools=with the builtin catalogue (last-write-wins on name collision).AgentDelegationToolstreaming — uses inner agent'sstream()and packs events intotool_completed.metadata['events'].
Test coverage
- 521 unit tests (up from 285) — green.
- 19 end-to-end Bedrock smoke tests in
scripts/smoke_bedrock_e2e.pycover every public surface end-to-end against real Bedrock.
v1.0.2 — 2026-04-10
Major feature release. Deep agents, structured output, pipelines, agent teams, advanced memory, output parsers, and runtime power features. 285 tests. 12 examples. 8 notebooks. 13 new doc pages.
Deep Agents
- GoalAgent — Autonomous goal decomposition with success criteria, streaming, and
.with_builtins(). Guide - ReflectiveAgent — Self-evaluation with quality scores and revision loop. Guide
- Supervisor / Worker — Hierarchical delegation with quality review. Guide
- AdaptiveAgent — Runtime tool creation from Python code. Guide
- PersistentAgent — Checkpoint and resume across sessions. Guide
- Channel / AgentMessage — Typed agent-to-agent communication. Guide
- AgentBenchmark — Systematic agent testing framework. Guide
- Deep Agents API Reference — Full constructor, method, and return type docs. Reference
Structured Output & Parsers
output_schemaon Agent.run() — Pydantic models + JSON schemas. Guide- JSONParser, PydanticParser, RegexParser, MarkdownParser. Guide
Composition
- Pipeline — Sequential, parallel, conditional, function steps, streaming. Guide
- AgentTeam — LLM-routed multi-agent coordination with streaming. Guide
Advanced Memory
- ConversationMemory — buffer/window/summary/token strategies. Guide
- SemanticMemory — Embedding-based vector search. Guide
- EntityMemory — Track people, projects, concepts. Guide
- AgentMemory — Unified interface with
.default(). Guide
Runtime Power Features
- Parallel tool execution. Guide
- Graceful tool failure. Guide
- Context window management. Guide
- Hooks & middleware. Guide
- Mid-run re-planning. Guide
- Async runtime. Guide
- Transient error auto-retry (429/500/503).
Changed
- Selective memory storage (breaking) — Only
persist=Truetool results stored. - Safer retry defaults —
(ConnectionError, TimeoutError, OSError)instead of(Exception,).
v1.0.1 — 2026-04-09
Maintenance release. Bug fix in the tool runner plus repo hygiene, contributor experience, and CI hardening. Strongly recommended upgrade from 1.0.0 if you use Bedrock gpt-oss-120b.
Fixed
ToolRunnerargument collision — FixedTypeError: got multiple values for argument 'context'when an LLM (notablybedrock/openai.gpt-oss-120b-1:0) emitscontextas a tool-call argument. The runner now strips reserved argument names (context,self) from tool-call arguments before forwarding. Affects every built-in tool.
Added
CHANGELOG.mdat repo root in Keep a Changelog formatCONTRIBUTING.mdwith dev setup, commit conventions, PR checklist, and "how to add a new LLM adapter / tool" guides- GitHub issue templates — structured bug report, feature request, and config forms
- PR template with 12-item verification checklist
- Test CI —
pytest -qon Python 3.11 + 3.12 × Ubuntu + macOS (4 matrix cells), with smoke-test of all 11 LLM adapter imports - Gitleaks secret scanning CI with SARIF upload to GitHub Security tab, inline PR comments, Actions summary
- Pre-commit hooks — trailing whitespace, EOF fixer, YAML/TOML validation, gitleaks v8.21.2, ruff lint + format
- Gitleaks allowlist for runtime tool outputs (scraped HTML contains false-positive "API keys" like Pushly domainKeys)
Changed
.gitignorerewritten to dedupe entries and cover all runtime directories (site/,.eggs/,pip-wheel-metadata/)- Runtime tool outputs untracked from git (
sessions/,traces/,memory.json,.shipit_notebooks/**) — they were accidentally committed in 1.0.0
Security
- Added CI and pre-commit secret scanning to prevent future credential leaks
- No runtime code changed —
shipit_agent/module is byte-identical to 1.0.0
v1.0.0 — 2026-04-09
First stable release. Focused on making the agent loop observable, interchangeable, and out of the way.
🧠 Live reasoning / thinking events
LLMResponse.reasoning_contentfield added to carry thinking/reasoning blocks from any provider- New
_extract_reasoning()helper handles three shapes:- Flat
reasoning_contenton the response message (OpenAI o-series,gpt-oss, DeepSeek R1, Anthropic via LiteLLM) - Anthropic
thinking_blocks[*].thinking(Claude extended thinking) model_dump()fallback for pydantic dumps
- Flat
- Runtime emits
reasoning_started+reasoning_completedevents whenever reasoning content is non-empty - All three LLM adapters —
OpenAIChatLLM,AnthropicChatLLM,LiteLLMChatLLM/BedrockChatLLM— share the extraction helper OpenAIChatLLMauto-passesreasoning_effort="medium"for reasoning-capable models (o1*,o3*,o4*,gpt-5*,deepseek-r1*)AnthropicChatLLMsupportsthinking_budget_tokens=Nto enable Claude extended thinking
⚡ Truly incremental streaming
agent.stream()now runs the agent on a background daemon thread- Events are pushed through a thread-safe
queue.Queueas they're emitted - Consumer loop yields events the instant they happen — no buffering, no batched delivery
- Worker exceptions are captured and re-raised on the consumer thread
- Works in Jupyter, VS Code, JupyterLab, WebSocket/SSE transports, and plain terminals
🛡️ Bulletproof Bedrock tool pairing
- Planner output is now injected as a
user-role context message rather than an orphanrole="tool"message — fixes Bedrock's "number of toolResult blocks exceeds number of toolUse blocks" error - Every
response.tool_callsentry gets a tool-result message unconditionally:- Success → real tool-result
- Retry → retries first, then final result or error
- Unknown tool → synthetic
"Error: tool X is not registered"tool-result
- Stable
call_{iteration}_{index}tool_call_ids round-trip through message metadata - Multi-iteration tool loops on Bedrock Claude, gpt-oss, and Anthropic native now work without
modify_paramsband-aids
🔑 Zero-friction provider switching
build_llm_from_env()walks upward from CWD to discover.env, so notebooks and scripts work regardless of where they're launched from- Seven providers:
openai,anthropic,bedrock,gemini,vertex,groq,together,ollama, plus a genericlitellmprovider - Per-provider credential validation with clear error messages
SHIPIT_OPENAI_TOOL_CHOICE=requiredenv var to force tool use on lazy models likegpt-4o-mini
🌐 In-process Playwright for open_url
OpenURLToolnow uses Playwright's sync Chromium directly (headless, realistic desktop Chrome UA, 1280×800 viewport)- Handles JS-rendered pages, anti-bot 503s, modern TLS/ALPN
- Stdlib
urllibfallback when Playwright is not installed — zero third-party HTTP dependencies in the core fallback path - Errors never raise out of the tool: they return as
ToolOutputwith awarningslist in metadata - Rich metadata:
fetch_method,status_code,final_url,title
🔍 Upgraded ToolSearchTool
- Replaced binary substring match with drk_cache-style fuzzy scoring:
SequenceMatcher.ratio() + 0.12 × token_hits - Configurable
limitparameter, clamped to[1, max_limit] - New init kwargs:
max_limit,default_limit,token_bonus - Structured error output for empty queries
- Ranked output with scores and "when to use" hints from
prompt_instructions - Noise filter: results below
score=0.05dropped
🪵 Full event taxonomy
14 distinct event types with documented payloads:
run_started, mcp_attached, planning_started, planning_completed, step_started, reasoning_started, reasoning_completed, tool_called, tool_completed, tool_retry, tool_failed, llm_retry, interactive_request, run_completed
🔁 Iteration-cap summarization fallback
- If the model is still calling tools when
max_iterationsis reached, the runtime gives it one more turn withtools=[]to force a natural-language summary run_completedis never empty for normal runs- Guarded with try/except so summarization failures can't mask the rest of the run
Other changes
pyproject.toml:[project.urls]now points to correct GitHub org, addsDocumentationandChangeloglinks.env.example: expanded with all new env vars documentednotebooks/04_agent_streaming_packets.ipynb: full rewrite with .env loading, credential visibility printer, and live Markdown updatesREADME.md: new v1.0 release section with 8 headline features- Full MkDocs Material documentation site at shipiit.github.io/shipit_agent
Breaking changes
None — this is the first stable release. Subsequent 1.x releases will maintain backward compatibility within the 1.x line.