Pay-per-use AI inference built on Cloudflare Workers and Durable Objects. Authenticate via EIP-4361 (SIWE), pay in USDC on Base through the x402 protocol, and start prompting — no signup, no API key.

Why Durable Objects

Most payment-gated APIs bolt a database onto a stateless server — Postgres for balances, Redis for rate-limiting, a separate auth layer for identity. Durable Objects collapse all of that into a single primitive.

Each wallet is routed to its own DO instance via idFromName. The Worker calls typed RPC methods directly on the stub — no HTTP routing inside the DO, just plain function calls with full type safety. On activation, blockConcurrencyWhile() in the constructor loads the wallet state from embedded SQLite once, before any request is served — eliminating per-request persistence overhead. Each instance owns its token balance exclusively — no shared state, no contention, no double-spend. Replay prevention uses a seen_transactions SQL table to atomically check tx hashes and top up the balance, at the edge, with no external infrastructure. The DO alarm handles grace-mode re-verification of provisional deposits when the Base RPC is unreachable, and runs TTL-based cleanup of expired records.

Conversation memory

Each wallet's DO instance stores the conversation in an embedded SQLite history table. On every request, the full message array is sent to the model — so follow-up questions and multi-turn reasoning work without the client managing any state. The reply is captured and saved via ctx.waitUntil() after the stream, adding zero latency. Each assistant message stores usage metadata (cost, model) so details persist across reloads. History is capped at 20 messages and tied to the wallet, not the browser.

RAG — Document Knowledge Base

Each wallet can upload text documents that become part of its personal knowledge base. Documents are chunked (~400 tokens, 50-token overlap), embedded via Workers AI (bge-base-en-v1.5, 768-dim), and stored in a shared Cloudflare Vectorize index with per-wallet metadata filtering. When useRag: true is set on an inference request, the user's prompt is embedded, matched against their documents (top-5 chunks, cosine similarity ≥ 0.45), and the relevant text is injected as a system message. Total input (prompt + history + RAG context) is validated against each model's context window — the server returns 413 if exceeded. RAG context is ephemeral — not stored in history — and regenerated fresh each request. RAG failure is non-fatal: if Vectorize is unreachable, inference proceeds without context. Embedding cost is deducted at document upload; the extra LLM input tokens from retrieved context are captured by the normal billing formula.

Request flow diagrams
Authentication
sequenceDiagram participant C as Client participant W as Worker participant DO as Durable Object C->>W: GET /auth/nonce W->>DO: RPC handleNonce() DO-->>C: nonce C->>W: POST /auth/login W->>W: recover address [EIP-191] W->>DO: RPC handleVerifyNonce W-->>C: Set-Cookie ig_session

SIWE (default): The flow above shows standard Sign-In with Ethereum (EIP-4361). The client requests a nonce, signs a message with their wallet, and receives a session cookie for subsequent authenticated requests.

SIWX alternative: x402-compatible clients can skip the nonce/login steps and pass a SIGN-IN-WITH-X header on POST /infer for single-request wallet auth. The 402 response advertises this via a sign-in-with-x extension.

Inference (with RAG)
sequenceDiagram participant C as Client participant W as Worker participant DO as Durable Object participant V as Vectorize participant AI as Workers AI C->>W: POST /infer + cookie W->>W: verify session W->>DO: RPC handleInfer DO->>DO: load balance + history alt useRag enabled DO->>AI: embed prompt AI-->>DO: query vector DO->>V: query vector + wallet filter V-->>DO: matching chunks alt no chunks found DO->>DO: SQL fallback - full docs end DO->>DO: inject files as system msg end DO->>DO: validate tokens vs context window alt over limit DO-->>C: 413 input too large end DO->>AI: messages array AI-->>C: SSE stream DO->>DO: save reply + deduct cost
File upload (RAG)
sequenceDiagram participant C as Client participant W as Worker participant DO as Durable Object participant V as Vectorize participant AI as Workers AI C->>W: POST /documents W->>DO: RPC handleDocumentUpload DO->>DO: save to SQLite DO->>DO: chunk text DO->>AI: embed chunks AI-->>DO: vectors DO->>V: upsert vectors + metadata V-->>DO: ok DO-->>C: doc id + title + cost
Endpoints
Public
GET    /health        liveness probe
GET    /payment-info  payment address + network details

Auth (SIWE / SIWX)
GET    /auth/nonce    generate one-time nonce
POST   /auth/login    verify signature → session cookie (SIWE)
POST   /auth/logout   clear session cookie
       SIGN-IN-WITH-X header for single-request auth (SIWX)

Authenticated (Cookie or SIWX)
POST   /infer     run inference (post-billed, optional systemPrompt)
POST   /deposit   top-up balance without inference
GET    /balance   token balance + usage stats
GET    /history   conversation messages + meta
DELETE /history      clear conversation

RAG Documents (Cookie)
POST   /documents         upload document for RAG
GET    /documents         list uploaded documents
DELETE /documents/:id     delete document + embeddings
POST   /documents/reindex re-upsert all document vectors

Admin (Bearer ADMIN_SECRET)
GET    /admin/wallets              paginated wallet list
GET    /admin/wallets/:wallet/status  wallet status + balance
GET    /admin/stats                aggregate statistics
GET    /admin/stale                zero-balance inactive wallets
Payment flow
  1. Sign in with wallet (SIWE or SIWX header) → receive session
  2. POST /infer with 0 balance → 402 + PAYMENT-REQUIRED header (includes sign-in-with-x extension for x402 clients)
  3. Send USDC on Base Mainnet → POST /deposit with proof → balance topped up
  4. Subsequent POST /infer requests deduct from balance automatically
Pricing
0.001 USDC → 1,000 tokens

  Model             Context     Cost
  Llama 3.1 8B       7,968 tok  ~8–10 tokens/req
  Llama 3.3 70B     24,000 tok  ~9–13 tokens/req
  Gemma 3 12B        8,000 tok  ~8–10 tokens/req
  Mistral 7B         8,000 tok  ~4–8 tokens/req
  DeepSeek R1 32B   80,000 tok  ~10–25 tokens/req
Machine-readable specs
GET /openapi.json         OpenAPI 3.1 specification
GET /.well-known/agent.json  A2A agent card
GET /.well-known/agents.json Agents.json (multi-step flows)
GET /SKILL.md              Agent-readable markdown
Try it
Add Tokens
1,000
$0.001
5,000
$0.005
10,000
$0.01
50,000
$0.05
or enter custom amount
Wallet
💸 Payment required
Send 0.001 USDC on Base Mainnet to:
Model
Drop file or browse .pdf .docx .txt .md .csv .json .html