Pay-per-use AI inference via the x402 payment protocol. No signup, no API key — your wallet is your identity, your balance, and your session.
Most payment-gated APIs bolt a database onto a stateless server — Postgres for balances, Redis for rate-limiting, a separate auth layer for identity. Durable Objects collapse all of that into a single primitive.
Each wallet is routed to its own DO instance via idFromName. That instance owns its µUSDC balance exclusively — no shared state, no contention, no double-spend. Replay prevention uses storage.transaction() to atomically check a seen:{txHash} key and top up the balance in one operation, at the edge, with no external infrastructure.
Each wallet's DO instance stores the conversation as a history key. On every request, the full message array is sent to the model — so follow-up questions and multi-turn reasoning work without the client managing any state. The reply is captured and saved via ctx.waitUntil() after the stream, adding zero latency. History is capped at 20 messages and tied to the wallet, not the browser.
POST /infer run inference (post-billed by token usage)
GET /balance { balance, totalDepositedMicroUSDC, totalSpentMicroUSDC, totalRequests }
GET /history [ { role, content }, … ]
DELETE /history clear conversation
POST /infer → 402 + PAYMENT-REQUIRED header (base64 JSON)0.001 USDC on Base Mainnet, get a txHashPAYMENT-SIGNATURE: btoa(proof) → 200 SSE stream + balance topped up0.001 USDC → 1,000 µUSDC balance Billed at actual token usage · 1 neuron = 11 µUSDC Llama 3.1 8B ~200–500 credits typical Llama 3.3 70B ~500–2000 credits typical (typical request)