> For the complete documentation index, see [llms.txt](https://docs.millimetric.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.millimetric.ai/reference/rate-limits.md).

# Rate limits

Rate limits are **per project, per route**, enforced by an in-memory token bucket on the Cloudflare Worker. There's no per-key or per-IP limit on top.

## The numbers

| Endpoint            | Refill rate                          | Burst capacity |
| ------------------- | ------------------------------------ | -------------- |
| `POST /v1/track`    | 50 / sec                             | 200            |
| `POST /v1/batch`    | 5 / sec                              | 20             |
| `GET /v1/query`     | unlimited                            | n/a            |
| `GET /v1/stats`     | unlimited                            | n/a            |
| `GET /v1/sources`   | unlimited                            | n/a            |
| `POST /v1/identify` | unlimited                            | n/a            |
| `POST /v1/forget`   | unlimited                            | n/a            |
| `POST /mcp`         | inherits the underlying tool's limit | —              |

When a bucket is empty, the Worker returns:

```http
HTTP/1.1 429 Too Many Requests
Retry-After: 1
Content-Type: application/json

{ "error": "rate_limited", "retry_after_s": 1 }
```

## Why these numbers

The headline workload is browser SDKs flushing buffered events. A typical session does < 10 events. 50/sec sustained per project is enough headroom for \~5,000 concurrent active visitors *per project*.

`/v1/batch` is rate-limited more tightly because each call delivers up to 1000 events — 5/sec × 1000 = 5,000 events/sec sustained, 20-call burst = 20,000 events instantly.

## How to stay under

### Use the SDK

The [browser SDK](/sdks/browser.md) batches automatically (every 20 events or 2 seconds, whichever comes first) and flushes on `pagehide` via `sendBeacon`. You'll never hit the limits from a browser unless you're calling `track()` in a tight loop.

The [Node SDK](/sdks/node.md) is single-event-per-call by default (`flushAt: 1`). For high-volume servers, set `flushAt: 50` and the SDK switches to `/v1/batch`.

```ts
init({ key, host, flushAt: 50, flushIntervalMs: 2000 });
```

### Backfill via /v1/batch

Backfilling 100k events through `/v1/track` would take \~33 minutes wall-clock at 50/sec. Through `/v1/batch` with chunks of 1000 events:

```ts
for (let i = 0; i < events.length; i += 1000) {
  const chunk = events.slice(i, i + 1000);
  await fetch("/v1/batch", { /* ... */ body: JSON.stringify({ events: chunk }) });
  await new Promise(r => setTimeout(r, 200));   // ~5/sec
}
```

100k events in 100 batch calls × 200 ms = 20 seconds.

### Spread across projects

Limits are per project. If you have multiple projects in one organisation, traffic to each gets its own bucket.

## Implementation note

The bucket is currently **in-memory per Worker instance**. Cloudflare scales Workers horizontally, so the *effective* limit for a project at very high traffic is roughly `(rate × N_instances)`. That over-counts in practice but it's the right ceiling to plan against.

For multi-tenant fairness at scale, swap the in-memory bucket for a Durable Object — see `apps/api/src/auth/rateLimit.ts`. That's a one-file change when needed; it isn't yet because the current setup hasn't been the bottleneck.

## Backoff strategy

Honour `Retry-After`:

```ts
async function trackWithRetry(payload, retries = 3) {
  const res = await fetch("/v1/track", { /* ... */ body: JSON.stringify(payload) });
  if (res.status === 429 && retries > 0) {
    const wait = Number(res.headers.get("Retry-After") ?? "1") * 1000;
    await new Promise(r => setTimeout(r, wait));
    return trackWithRetry(payload, retries - 1);
  }
  return res;
}
```

The Node SDK does this automatically on `5xx`. It does **not** automatically retry `429` (since the bucket is per-project, retries from many concurrent requests just keep the bucket empty). For `429`, the SDK throws and lets you decide.

## See also

* [Errors](/reference/errors.md) — `rate_limited` and friends.
* [POST /v1/batch](/api-reference/batch.md) — the cheap way to send many events.
* [Node SDK](/sdks/node.md) — batching & retry behaviour.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.millimetric.ai/reference/rate-limits.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
