dox402 · pay-per-use AI

dox402

pay-per-use AI inference

Pay-per-use AI inference built on Cloudflare Workers and Durable Objects. Authenticate via EIP-4361 (SIWE), pay in USDC on Base through the x402 protocol, and start prompting — no signup, no API key.

Why Durable Objects

Most payment-gated APIs bolt a database onto a stateless server — Postgres for balances, Redis for rate-limiting, a separate auth layer for identity. Durable Objects collapse all of that into a single primitive.

Each wallet is routed to its own DO instance via idFromName. The Worker calls typed RPC methods directly on the stub — no HTTP routing inside the DO, just plain function calls with full type safety. On activation, blockConcurrencyWhile() in the constructor loads the wallet state from embedded SQLite once, before any request is served — eliminating per-request persistence overhead. Each instance owns its token balance exclusively — no shared state, no contention, no double-spend. Replay prevention uses a seen_transactions SQL table to atomically check tx hashes and top up the balance, at the edge, with no external infrastructure. The DO alarm handles grace-mode re-verification of provisional deposits when the Base RPC is unreachable, and runs TTL-based cleanup of expired records.

Conversation memory

Each wallet's DO instance stores the conversation in an embedded SQLite history table. On every request, the full message array is sent to the model — so follow-up questions and multi-turn reasoning work without the client managing any state. The reply is captured and saved via ctx.waitUntil() after the stream, adding zero latency. Each assistant message stores usage metadata (cost, model) so details persist across reloads. History is capped at 20 messages and tied to the wallet, not the browser.

RAG — Document Knowledge Base

Each wallet can upload text documents that become part of its personal knowledge base. Documents are chunked (~400 tokens, 50-token overlap), embedded via Workers AI (bge-base-en-v1.5, 768-dim), and stored in a shared Cloudflare Vectorize index with per-wallet metadata filtering. When useRag: true is set on an inference request, the user's prompt is embedded, matched against their documents (top-5 chunks, cosine similarity ≥ 0.45), and the relevant text is injected as a system message. Total input (prompt + history + RAG context) is validated against each model's context window — the server returns 413 if exceeded. RAG context is ephemeral — not stored in history — and regenerated fresh each request. RAG failure is non-fatal: if Vectorize is unreachable, inference proceeds without context. Embedding cost is deducted at document upload; the extra LLM input tokens from retrieved context are captured by the normal billing formula.

Request flow diagrams

▶ Authentication

sequenceDiagram participant C as Client participant W as Worker participant DO as Durable Object C->>W: GET /auth/nonce W->>DO: RPC handleNonce() DO-->>C: nonce C->>W: POST /auth/login W->>W: recover address [EIP-191] W->>DO: RPC handleVerifyNonce W-->>C: Set-Cookie ig_session

SIWE (default): The flow above shows standard Sign-In with Ethereum (EIP-4361). The client requests a nonce, signs a message with their wallet, and receives a session cookie for subsequent authenticated requests.

SIWX alternative: x402-compatible clients can skip the nonce/login steps and pass a SIGN-IN-WITH-X header on POST /infer for single-request wallet auth. The 402 response advertises this via a sign-in-with-x extension.

▶ Inference (with RAG)

sequenceDiagram participant C as Client participant W as Worker participant DO as Durable Object participant V as Vectorize participant AI as Workers AI C->>W: POST /infer + cookie W->>W: verify session W->>DO: RPC handleInfer DO->>DO: load balance + history alt useRag enabled DO->>AI: embed prompt AI-->>DO: query vector DO->>V: query vector + wallet filter V-->>DO: matching chunks alt no chunks found DO->>DO: SQL fallback - full docs end DO->>DO: inject files as system msg end DO->>DO: validate tokens vs context window alt over limit DO-->>C: 413 input too large end DO->>AI: messages array AI-->>C: SSE stream DO->>DO: save reply + deduct cost

▶ File upload (RAG)

sequenceDiagram participant C as Client participant W as Worker participant DO as Durable Object participant V as Vectorize participant AI as Workers AI C->>W: POST /documents W->>DO: RPC handleDocumentUpload DO->>DO: save to SQLite DO->>DO: chunk text DO->>AI: embed chunks AI-->>DO: vectors DO->>V: upsert vectors + metadata V-->>DO: ok DO-->>C: doc id + title + cost

Endpoints

Public
GET    /health        liveness probe
GET    /payment-info  payment address + network details

Auth (SIWE / SIWX)
GET    /auth/nonce    generate one-time nonce
POST   /auth/login    verify signature → session cookie (SIWE)
POST   /auth/logout   clear session cookie
       SIGN-IN-WITH-X header for single-request auth (SIWX)

Authenticated (Cookie or SIWX)
POST   /infer     run inference (post-billed, optional systemPrompt)
POST   /deposit   top-up balance without inference
GET    /balance   token balance + usage stats
GET    /history   conversation messages + meta
DELETE /history      clear conversation

RAG Documents (Cookie)
POST   /documents         upload document for RAG
GET    /documents         list uploaded documents
DELETE /documents/:id     delete document + embeddings
POST   /documents/reindex re-upsert all document vectors

Admin (Bearer ADMIN_SECRET)
GET    /admin/wallets              paginated wallet list
GET    /admin/wallets/:wallet/status  wallet status + balance
GET    /admin/stats                aggregate statistics
GET    /admin/stale                zero-balance inactive wallets

Payment flow

Sign in with wallet (SIWE or SIWX header) → receive session
POST /infer with 0 balance → 402 + PAYMENT-REQUIRED header (includes sign-in-with-x extension for x402 clients)
Send USDC on Base Mainnet → POST /deposit with proof → balance topped up
Subsequent POST /infer requests deduct from balance automatically

Pricing

0.001 USDC → 1,000 tokens

  Model             Context     Cost
  Llama 3.1 8B       7,968 tok  ~8–10 tokens/req
  Llama 3.3 70B     24,000 tok  ~9–13 tokens/req
  Gemma 3 12B        8,000 tok  ~8–10 tokens/req
  Mistral 7B         8,000 tok  ~4–8 tokens/req
  DeepSeek R1 32B   80,000 tok  ~10–25 tokens/req

Machine-readable specs

GET /openapi.json                          OpenAPI 3.1 specification
GET /.well-known/agent.json                 A2A agent card
GET /.well-known/agents.json                Agents.json (multi-step flows)
GET /.well-known/api-catalog                API Catalog (RFC 9727 linkset)
GET /.well-known/agent-skills/index.json    Agent Skills index (v0.2.0)
GET /SKILL.md                               Agent-readable markdown
GET /robots.txt                             Crawl rules + Content Signals
GET /sitemap.xml                            Canonical URLs

Content negotiation on /

GET / -H "Accept: text/markdown"     Returns SKILL.md as text/markdown
GET / -H "Accept: text/html"         HTML response includes RFC 8288 Link headers
                                     pointing at openapi.json, agent.json, agents.json,
                                     and SKILL.md for spec discovery

In-browser agents (WebMCP)

When loaded in a WebMCP-capable browser, this page registers tools via
navigator.modelContext.registerTool():

  connect_wallet     Trigger SIWE sign-in
  get_balance        Read current token balance
  send_inference     Run inference with prompt + optional model
  view_history       Return conversation history
  open_deposit_ui    Open the deposit panel (no auto-spend)

Try it

—

Add Tokens —

1,000

$0.001

5,000

$0.005

10,000

$0.01

50,000

$0.05

or enter custom amount

USDC

Wallet

💸 Payment required

Send 0.001 USDC on Base Mainnet to:

—

Model RAG

Drop file or browse .pdf .docx .txt .md .csv .json .html