Citations & Batch API

Attach source documents with citations enabled and parse Claude's citations back out of metadata for verifiable RAG. Plus the Batch API runtime — BatchRequest / BatchResult / BatchRuntime.run() — for ~50%-cheaper bulk runs. New in v1.0.12.

4 min read
10 sections
Edit this page

New in v1.0.12, shipit_agent ships two Claude-API passthroughs for production-grade workloads: document citations (verifiable, grounded answers) and a Batch API runtime (bulk runs at roughly half price).

Citations — verifiable RAG

When you attach a source document block with citations.enabled, Claude grounds its answer in that document and emits citations on the text blocks of its reply — each pointing back at a character, page, or content-block range in the source. shipit parses those into LLMResponse.metadata["citations"], so every claim is traceable back to its source span. That's the difference between RAG that says it used a source and RAG you can verify.

Document helpers

The constructors live in shipit_agent.llms (from shipit_agent.llms.citations). Each builds a document content block in the SDK's source-param shape; citations are enabled by default — that is the whole point of the helper:

HelperSource typeUse for
text_document(text, *, title=None, context=None, citations=True)textPlain-text sources.
pdf_document(data_base64, *, title=None, context=None, citations=True)base64A base64-encoded PDF.
url_pdf_document(url, *, title=None, context=None, citations=True)urlA PDF fetched from a URL.
content_document(content, *, title=None, context=None, citations=True)contentA document assembled from content blocks.
python
from shipit_agent.llms import AnthropicChatLLM, text_document, url_pdf_document

# Attach default documents to every call on this adapter…
llm = AnthropicChatLLM(
    model="claude-sonnet-4-20250514",
    documents=[text_document("Refunds are processed within 5 business days.",
                      title="Refund policy"),
        url_pdf_document("https://example.com/handbook.pdf", title="Handbook"),],
)

response = llm.complete(messages=[...])   # or pass documents=[...] per call
print(response.content)

for cite in response.metadata.get("citations", []):
    print(cite)   # e.g. {"type": "char_location", "document_title": ..., ...}

Documents passed to AnthropicChatLLM(documents=[...]) are attached to every complete() call; a per-call documents=[...] argument overrides them. The blocks are prepended to the last user message.

Parsing citations back out

extract_citations(content_blocks) walks the response's text blocks, reads each block's citations (location objects like char_location, page_location, content_block_location), and returns them as plain dicts — which the adapter places at metadata["citations"] (only present when the response actually cited something). It's defensive throughout: any block shape that doesn't look like a cited text block is skipped, so non-citation responses simply yield [].

Provider note

Citations are an Anthropic feature. They work with AnthropicChatLLM (and Anthropic models reached via Bedrock / LiteLLM). Other providers ground answers through their own mechanisms; the document-citation shape here is Claude's.

Batch API — ~50%-cheaper bulk runs

For large, latency-tolerant workloads — evals, backfills, nightly summarisation, dataset labelling — the Anthropic Messages Batches API processes many requests asynchronously and is billed at roughly 50% of the standard per-token price. shipit wraps it in shipit_agent.batch.BatchRuntime.

python
from shipit_agent.batch import BatchRequest, BatchRuntime

runtime = BatchRuntime(api_key="sk-...")

results = runtime.run([BatchRequest(custom_id="q1", prompt="Summarise this ticket: ..."),
    BatchRequest(custom_id="q2", prompt="Classify sentiment: ..."),])

for r in results:
    if r.ok:
        print(r.custom_id, r.output)
    else:
        print(r.custom_id, "ERROR:", r.error)

BatchRequest

One request to include in a batch:

FieldMeaning
custom_idCaller-provided id, echoed back on the matching BatchResult. Unique within a batch.
promptConvenience single-turn user prompt (used to build messages when messages is not set).
messagesExplicit [{"role", "content"}] list; takes precedence over prompt.
systemOptional system prompt (omitted from the payload when None).
max_tokensMax output tokens for this request (default 1024).
modelModel id for this request (default claude-3-5-sonnet-latest).
metadataOptional metadata object forwarded to the API.

BatchResult

The outcome of a single batched request:

FieldMeaning
custom_idThe originating request's id.
outputConcatenated assistant text on success, otherwise "".
usageThe message usage object on success, otherwise None.
errorHuman-readable error for errored / canceled / expired results, otherwise None.
result_typeRaw discriminator: succeeded / errored / canceled / expired.
rawThe underlying SDK result object (stop reason, content blocks, request id, …).
.okTrue only when the request succeeded with no error captured.

BatchRuntime

BatchRuntime.run(requests, *, poll_interval=30.0, timeout=86400, ...) is the one-call path: it submits, polls until the batch ends, and returns mapped BatchResults. The lower-level operations are also exposed:

  • submit(requests) -> batch_id — create the batch.
  • status(batch_id) -> str"in_progress", "canceling", or "ended" (only "ended" is terminal).
  • results(batch_id) -> list[BatchResult] — stream and map the results.
  • cancel(batch_id) -> str — request cancellation; returns the new status.

run() raises TimeoutError if the batch doesn't reach "ended" within timeout (default 24h). Per-entry mapping is wrapped so a malformed entry becomes a BatchResult with error set rather than raising. Pass a pre-built client= (or any object exposing messages.batches.create/retrieve/results/ cancel) for testing; sleep= and now= on run() are injectable for deterministic tests. MessageBatchRunner is an alias for BatchRuntime.

Provider note

Both features are Anthropic today — citations use Anthropic's document blocks, and the batch runtime wraps Anthropic's Messages Batches API. OpenAI also has a Batch API; generalising the batch runtime across providers is on the roadmap.

See also