ComputerUseAgent · v1.0.8

Self-host Devin.
In thirty lines of Python.

Drive a real browser by showing screenshots to a vision-capable LLM. Use it standalone for one-shot workflows, or plug it into your main Agent as a browser_use tool. Anthropic native + plain-text fallback for any vision LLM.

Four steps. Endlessly composable.

The same loop drives a 4-iteration price lookup or a 30-iteration form-filling workflow. Recovery from failed actions is built in.

Step 01

Screenshot

Take a base64 PNG of the current viewport.

Step 02

Reason

Vision LLM looks at the screenshot + the goal, picks the next action.

Step 03

Act

Click, type, scroll, navigate, or signal `done`.

Step 04

Loop

Until the model emits `done` or `max_iterations`.

Two patterns

Standalone or as a tool inside your main Agent.

For one-shot browser work, run ComputerUseAgent directly. For production agents that mix browser work with web search, PDFs, RAG, or SQL, plug it in as BrowserAgentTool.

Pattern 1

Standalone ComputerUseAgent

single task
from shipit_agent.computer_use import (
ComputerUseAgent, PlaywrightBrowserSession,
)
with PlaywrightBrowserSession.launch(headless=True) as browser:
agent = ComputerUseAgent(
llm=opus_llm,
browser=browser,
goal="Find iPhone 15 Pro starting price.",
max_iterations=10,
)
result = agent.run()
print(result.final_text)
Fastest path for one-shot workflows.Docs
Pattern 2 · Recommended for production

BrowserAgentTool inside main Agent

composable
from shipit_agent import Agent, VerifierNetwork
from shipit_agent.computer_use import (
BrowserAgentTool, PlaywrightBrowserSession,
)
# 1. Browser tool — owns its own LLM + browser factory
browser_tool = BrowserAgentTool(
llm=opus_llm,
browser_factory=lambda: PlaywrightBrowserSession.launch(headless=True),
max_iterations=12,
)
# 2. Optional: verifier so destructive actions get gated
verifier = VerifierNetwork(llm=haiku_llm, goal="Research only — no purchases.")
# 3. Plug into your main planning Agent — browser_use is one tool among many
agent = Agent(
llm=opus_llm,
tools=[browser_tool, WebSearchTool(), PDFTool()],
verifier=verifier,
)
# 4. Run a high-level goal — the main agent decides when to call browser_use
result = agent.run(
"Find the cheapest direct SFO-JFK flight on May 20 "
"and summarise the booking page."
)
The main agent picks `browser_use` when it's the right tool — same way it picks `web_search` or `pdf_extract`.Docs

Real recipes

Production patterns where browser-driving agents actually pay off. Drop the goal into your main Agent, let `browser_use` handle the click-paths.

Recipe 01

Price comparison

Goal sent to the agent
"Find the lowest price for [item] across [3 sites] and report the cheapest with a link."

Headless browser visits each site, extracts price, returns a structured comparison. Pair with `output_schema=PriceCompare` for typed output.

Recipe 02

Form filling at scale

Goal sent to the agent
"Fill the application form with [name=…, email=…, message=…]. Pause before submitting."

Run completes when the agent reaches the Submit button. The human reviews the screenshot in `result.action_history[-1]` then clicks for real.

Recipe 03

End-to-end UI testing

Goal sent to the agent
"Sign up with email='test+{ts}@example.com' and verify the dashboard loads with the welcome banner. Report PASS or FAIL."

Adapts when the UI shifts (no flaky CSS selectors). Failure modes are captured in `action_history` for replay.

Recipe 04

Internal SaaS without an API

Goal sent to the agent
"Log in to the analytics dashboard, navigate to the weekly report, capture the top-line numbers."

Use `share_browser=True` so credentials persist across calls. Combine with the verifier to gate any destructive actions.

Drive any browser.
Self-hosted. Yours.

Devin and OpenAI Operator are SaaS products. Ours is a library — fork it, run it on your own LLM, ship it in your own product.

pip install 'shipit-agent[anthropic,playwright]'