Engineering · Published 2026-05-20 · Read time: 10 min

MCP server for eSIM — a developer’s guide

How we built the first agent-native eSIM API: token lifecycle, five-layer spending defense, HMAC payment verification, kill switches, and the design choices that make crypto rails uniquely suited to autonomous agents.

Most online services were built for browsers. A user sits in front of a session-cookie-authenticated page, clicks buttons, types fields, and the backend assumes that workflow throughout. AI agents do not fit that mold — they have no browser session, no DOM to click, no camera to scan a QR code, and no patience for OAuth dances. Building for agents means rebuilding the auth, payment, and delivery layers around different assumptions.

This post walks through the architectural choices behind the Roamzy MCP server. It is intended for developers building AI agents that interact with real commerce systems, and for engineers at other services who are wondering what an "agent-ready" API actually looks like in production.

The MCP server is a thin wrapper

The MCP server itself is small — about 400 lines of TypeScript, bundled into a 100KB tarball via esbuild. It exists only to translate between two protocols: stdio-based MCP requests from Claude Desktop / Cursor / Continue, and HTTPS REST calls to https://roamzy.io/api/v1/*.

There is no business logic in the MCP server. Spending limits, token validation, payment verification — all of that lives on the backend, where it can be tested in isolation, audited centrally, and changed without forcing every agent user to update their MCP server. The MCP server is just the protocol bridge.

This is the right boundary. Putting business logic in the MCP server would split it across N+1 places (server + every client's installed tarball), introduce upgrade lag, and create attack surface on the user's laptop. By keeping the MCP server pure HTTP-client glue, we get a thin layer that can be easily audited (about 30 minutes for a security-conscious user) and updated by simply re-publishing the tarball at the same URL.

Tokens are SHA-256 hashed, shown once

The token format is rk_live_<32-char-base64url> — Stripe-style. The prefix is fixed so GitHub Secret Scanning and similar tools can detect leaks. The body is 24 random bytes encoded as base64url, giving 192 bits of entropy.

On creation, the plain text is shown to the user exactly once, in a yellow warning box. The server stores only the SHA-256 hash. A 12-character prefix-hint (rk_live_abc1) is kept for display so the user can identify which token is which when multiple are active.

Comparison uses Node's timingSafeEqual to defeat timing oracles. Token revoke is implemented as a soft-delete (setting revoked_at) so the audit trail survives. Revocation is immediate; the next API call with a revoked token returns 401 within milliseconds.

Five-layer spending defense

Most token-based systems have one or two spending limits. We have five, each addressing a different failure mode:

Daily limit per token (default $50 USDT, configurable $1-$1000). Cap on rolling 24-hour spend.
Monthly limit per token (default $500, configurable $1-$10000). Cap on rolling 30-day spend.
Cool-off period — first 7 days after token creation, total spend capped at $50 USDT regardless of daily limit. Hard-coded; cannot be raised. This bounds the blast radius if a token leaks immediately after creation (a developer pastes it into a public gist by mistake).
Big-transaction threshold (default $200) — purchases above this require manual confirmation by the human user in the dashboard. The API returns big_txn_needs_confirmation and the agent must surface this to the user.
Purchase scope opt-in — by default, the "Allow purchases" checkbox is unchecked on token creation, meaning the token can only call read endpoints. The user must consciously opt in. For an agent that only needs to query the catalog, this is the safest configuration.

Counters live on the api_tokens table with lazy window rotation: when a request reads the counter, if the stored window (YYYY-MM-DD for daily, YYYY-MM for monthly) does not match the current, the counter is treated as zero. This avoids a daily cron job and is tolerant to machine clock drift.

Enforcement happens twice. At order creation time, the server runs a soft-check: would amount + spent_today exceed the daily limit? If yes, the request is rejected with a stable error code and remaining-headroom hint. At payment webhook time, the counter is hard-incremented — only when real USDT has actually been received.

Payment integrity — HMAC and unique constraints

The single most important anti-fraud layer is HMAC-SHA-512 verification on the NowPayments webhook. The shared secret (IPN_SECRET) lives in the server environment. Every webhook payload arrives with an x-nowpayments-sig header; the server computes HMAC over the raw request body and compares. Without the secret, an attacker cannot forge a "paid" payload — even if they know the order ID and intent ID (they likely do, since these are returned in the order response).

The second layer is a DB-level UNIQUE constraint on (ref_type, ref_id) in the ledger table. Every credit is written with ref_type="nowpayments" and ref_id=<payment_id>. If the webhook fires twice for the same payment_id (network retry, replay attack), the second insert fails the unique constraint and the application short-circuits with alreadyApplied=true. No double-credit is possible.

The third layer is architectural: only the services/billing.ts functions can mutate the eSIM balance. There is no admin endpoint to arbitrarily add balance; even the "manually activate eSIM" admin tool goes through the same credit path. This minimizes the attack surface — there is exactly one code path through which balance can increase, and it is the one with all the safety checks.

Three kill switches for incident response

Things will go wrong. Tokens leak. A user's agent malfunctions and spams orders. A payment provider has an outage. We built three independent kill switches so the response can be proportional:

Per-token revoke — user clicks Revoke in the dashboard. Affects only that token. Used when one agent is misbehaving or a token leaked.
Per-user agent-block — admin calls POST /api/admin/users/:id/agent-block with a reason. Affects all of that user's tokens. Used for individual account abuse or compromise.
Global agent pause — admin calls POST /api/admin/agents/pause. Every Bearer token across all users returns 503. Used for infrastructure incidents or security investigations.

Status is visible to agents at GET /api/v1/status (no auth, CORS *). Well-behaved agents poll this before purchases and back off when purchases_paused or agents_paused is true. Roamzy reserves the right to throttle agents that ignore the status endpoint.

There is also a more granular pause: agents.purchases.paused blocks only POST /orders, leaving read endpoints alive. Useful when investigating a payment-side issue without blacking out informational API calls.

QR is the primary activation surface, not LPA

Most agent flows run on a different device than where the eSIM lives. The agent might be Claude Desktop on a Mac, while the eSIM needs to land on the user's iPhone. In this cross-device case, QR is the right surface: the API returns qr_image_url (a PNG URL via the qrserver.com renderer), the agent embeds it inline in the chat, and the user scans with their phone's camera.

The lpa_url field is also returned, but only useful when the agent and target device are the same — e.g. an agent running on the user's iPhone. On iOS 17.4+ and Android 14+, tapping an lpa: URL opens the system eSIM installer directly. But for most agent UX, the QR is more reliable.

This is a small design choice but it matters: a lot of MCP servers are written with a "browser tab" mental model where the user is sitting next to the agent, and would tap a link. The reality of consumer eSIM use is cross-device.

Why USDT, not fiat

Crypto payments are agent-native for three reasons that are not always obvious:

No card storage compliance. Storing card details requires PCI DSS Level 1 compliance, plus card-network rules about how the card holder must authenticate each transaction. Most of these rules assume a human is at the keyboard. Crypto sidesteps the entire regime.
No chargeback risk. Once a USDT transaction is confirmed on-chain, it is final. There is no "the cardholder disputed this transaction six months later" path. This matters more for agent commerce than for browser commerce, because the human user might not have been the one who reviewed and approved the spend in real time.
Settlement clarity. The agent budget is denominated in USDT. The user wallet holds USDT. The merchant settles in USDT. There is no fiat conversion in between, no exchange rate exposure, no banking-hours delay. The whole flow is API-shaped from end to end.

This does narrow the addressable market — most non-crypto-native travelers will not buy an eSIM in USDT. But for the cohort that does (crypto-native travelers, agent-driven flows, developers automating travel), it removes a stack of friction.

What we did not build (and why)

OAuth-style scoped tokens. We have a single permission flag (purchase / no purchase) instead of a granular scope list. The reasoning: agents in 2026 do not yet need a "read-balance-only" or "read-eSIM-only" distinction. If they do later, we will add scopes. Premature scope models create more bugs than they prevent.

Anonymous agent-first sign-up. The agent currently cannot create a Roamzy user on the fly; the human has to sign in once in a browser to issue a token. This is friction. We have it in the backlog as agent-first flow (anonymous user creation + magic-link claim), but we are waiting for traffic data before building. If 50% of potential users abandon at "create account", that is signal worth optimizing for; if 5%, it is not.

Webhook delivery for agent events. Currently the agent polls GET /orders/:id for status updates. A push-based webhook would be more efficient at scale, but adds complexity (agent has to host an endpoint, sign verification, retry policy). Polling at 5-second intervals during a 10-minute USDT confirmation window is 120 requests — cheap.

Try it yourself

If you are building an AI agent and want to integrate eSIM purchases, the fastest path is the MCP server. Install in Claude Desktop in 60 seconds — see the tutorial. For non-MCP clients, the full OpenAPI 3.0 spec is at /api/v1/openapi.json, the interactive Swagger UI is at /api/v1/docs, and the long-form agent guidance is at /llms-full.txt.

For deeper questions, the Telegram support bot is at @roamzy_support_bot.