For the complete documentation index, see llms.txt. This page is also available as Markdown.

Rate limits

Per-project, per-route token buckets — and how to stay under them.

Rate limits are per project, per route, enforced by an in-memory token bucket on the Cloudflare Worker. There's no per-key or per-IP limit on top.

The numbers

Endpoint
Refill rate
Burst capacity

POST /v1/track

50 / sec

200

POST /v1/batch

5 / sec

20

GET /v1/query

unlimited

n/a

GET /v1/stats

unlimited

n/a

GET /v1/sources

unlimited

n/a

POST /v1/identify

unlimited

n/a

POST /v1/forget

unlimited

n/a

POST /mcp

inherits the underlying tool's limit

When a bucket is empty, the Worker returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 1
Content-Type: application/json

{ "error": "rate_limited", "retry_after_s": 1 }

Why these numbers

The headline workload is browser SDKs flushing buffered events. A typical session does < 10 events. 50/sec sustained per project is enough headroom for ~5,000 concurrent active visitors per project.

/v1/batch is rate-limited more tightly because each call delivers up to 1000 events — 5/sec × 1000 = 5,000 events/sec sustained, 20-call burst = 20,000 events instantly.

How to stay under

Use the SDK

The browser SDK batches automatically (every 20 events or 2 seconds, whichever comes first) and flushes on pagehide via sendBeacon. You'll never hit the limits from a browser unless you're calling track() in a tight loop.

The Node SDK is single-event-per-call by default (flushAt: 1). For high-volume servers, set flushAt: 50 and the SDK switches to /v1/batch.

Backfill via /v1/batch

Backfilling 100k events through /v1/track would take ~33 minutes wall-clock at 50/sec. Through /v1/batch with chunks of 1000 events:

100k events in 100 batch calls × 200 ms = 20 seconds.

Spread across projects

Limits are per project. If you have multiple projects in one organisation, traffic to each gets its own bucket.

Implementation note

The bucket is currently in-memory per Worker instance. Cloudflare scales Workers horizontally, so the effective limit for a project at very high traffic is roughly (rate × N_instances). That over-counts in practice but it's the right ceiling to plan against.

For multi-tenant fairness at scale, swap the in-memory bucket for a Durable Object — see apps/api/src/auth/rateLimit.ts. That's a one-file change when needed; it isn't yet because the current setup hasn't been the bottleneck.

Backoff strategy

Honour Retry-After:

The Node SDK does this automatically on 5xx. It does not automatically retry 429 (since the bucket is per-project, retries from many concurrent requests just keep the bucket empty). For 429, the SDK throws and lets you decide.

See also

Last updated

Was this helpful?