Self-host Devin.
In thirty lines of Python.
Drive a real browser by showing screenshots to a vision-capable LLM. Use it standalone for one-shot workflows, or plug it into your main Agent as a browser_use tool. Anthropic native + plain-text fallback for any vision LLM.
Four steps. Endlessly composable.
The same loop drives a 4-iteration price lookup or a 30-iteration form-filling workflow. Recovery from failed actions is built in.
Screenshot
Take a base64 PNG of the current viewport.
Reason
Vision LLM looks at the screenshot + the goal, picks the next action.
Act
Click, type, scroll, navigate, or signal `done`.
Loop
Until the model emits `done` or `max_iterations`.
Standalone or as a tool inside your main Agent.
For one-shot browser work, run ComputerUseAgent directly. For production agents that mix browser work with web search, PDFs, RAG, or SQL, plug it in as BrowserAgentTool.
Standalone ComputerUseAgent
from shipit_agent.computer_use import (ComputerUseAgent, PlaywrightBrowserSession,)with PlaywrightBrowserSession.launch(headless=True) as browser:agent = ComputerUseAgent(llm=opus_llm,browser=browser,goal="Find iPhone 15 Pro starting price.",max_iterations=10,)result = agent.run()print(result.final_text)
BrowserAgentTool inside main Agent
from shipit_agent import Agent, VerifierNetworkfrom shipit_agent.computer_use import (BrowserAgentTool, PlaywrightBrowserSession,)# 1. Browser tool — owns its own LLM + browser factorybrowser_tool = BrowserAgentTool(llm=opus_llm,browser_factory=lambda: PlaywrightBrowserSession.launch(headless=True),max_iterations=12,)# 2. Optional: verifier so destructive actions get gatedverifier = VerifierNetwork(llm=haiku_llm, goal="Research only — no purchases.")# 3. Plug into your main planning Agent — browser_use is one tool among manyagent = Agent(llm=opus_llm,tools=[browser_tool, WebSearchTool(), PDFTool()],verifier=verifier,)# 4. Run a high-level goal — the main agent decides when to call browser_useresult = agent.run("Find the cheapest direct SFO-JFK flight on May 20 ""and summarise the booking page.")
Real recipes
Production patterns where browser-driving agents actually pay off. Drop the goal into your main Agent, let `browser_use` handle the click-paths.
Price comparison
"Find the lowest price for [item] across [3 sites] and report the cheapest with a link."Headless browser visits each site, extracts price, returns a structured comparison. Pair with `output_schema=PriceCompare` for typed output.
Form filling at scale
"Fill the application form with [name=…, email=…, message=…]. Pause before submitting."Run completes when the agent reaches the Submit button. The human reviews the screenshot in `result.action_history[-1]` then clicks for real.
End-to-end UI testing
"Sign up with email='test+{ts}@example.com' and verify the dashboard loads with the welcome banner. Report PASS or FAIL."Adapts when the UI shifts (no flaky CSS selectors). Failure modes are captured in `action_history` for replay.
Internal SaaS without an API
"Log in to the analytics dashboard, navigate to the weekly report, capture the top-line numbers."Use `share_browser=True` so credentials persist across calls. Combine with the verifier to gate any destructive actions.
Drive any browser.
Self-hosted. Yours.
Devin and OpenAI Operator are SaaS products. Ours is a library — fork it, run it on your own LLM, ship it in your own product.
pip install 'shipit-agent[anthropic,playwright]'