> For the complete documentation index, see [llms.txt](https://docs.millimetric.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.millimetric.ai/recipes/anonymous-to-known.md).

# Link anonymous → known users

A visitor lands from a Facebook ad on Tuesday. Browses for a week. Signs up on the following Monday. You want to credit that signup to the original Facebook click — *and* to know all the things they did before signing in.

This is the canonical "anonymous-to-known" stitch.

## What's happening under the hood

```
Day 1 14:00  $pageview        anon=u_abc  user=NULL    source=facebook/paid
Day 1 14:02  clicked_pricing  anon=u_abc  user=NULL    source=facebook/paid
Day 5 09:30  $pageview        anon=u_abc  user=NULL    source=direct/direct
Day 7 11:14  $identify        anon=u_abc  user=user_42                 ← here
Day 7 11:14  signup           anon=u_abc  user=user_42
Day 9 16:22  feature_used     anon=u_abc  user=user_42
```

`anonymous_id` is the same throughout. `user_id` only appears from the `$identify` onward. Historical events for `u_abc` aren't rewritten — they remain `user_id = NULL`. Stitching is a query-time join.

## Step 1 — call /v1/identify on signup or login

The single most important moment.

### From the browser

```ts
import { identify } from "@millimetric/track";

// after your auth flow resolves
identify(user.id, { email: user.email, plan: user.plan });
```

The browser SDK emits a `$identify` event and tags every subsequent `track()` with `user_id`.

### From the server (recommended for signups)

When the user is created in your DB, also POST identify:

```ts
import { init, flush } from "@millimetric/track-node";

await fetch(`${process.env.AOA_HOST}/v1/identify`, {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.AOA_SK}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    anonymous_id: req.cookies.aid,           // threaded from the browser
    user_id: createdUser.id,
    traits: { plan: createdUser.plan, email: createdUser.email }
  })
});
```

(See [Server-side events](/recipes/server-side.md) for how to thread `anonymous_id`.)

## Step 2 — use the same anonymous\_id post-login

After identify, the browser SDK keeps the same `anonymous_id`. Don't generate a new one — that breaks the link.

For server-rendered apps, persist the id in a first-party cookie (see [server-side recipe](/recipes/server-side.md)) so it survives logout/login on the same device.

## Step 3 — stitch in queries

Three increasingly thorough patterns.

### 3a. First-touch source per user

For "where did `user_42` come from?" — find the earliest event for any `anonymous_id` they've ever been associated with.

```sql
WITH user_anons AS (
  SELECT DISTINCT user_id, anonymous_id
  FROM events
  WHERE project_id = '...'
    AND user_id IS NOT NULL
)
SELECT
  u.user_id,
  argMin(e.source, e.timestamp)   AS first_touch_source,
  argMin(e.medium, e.timestamp)   AS first_touch_medium,
  min(e.timestamp)                AS first_seen_at
FROM events e
JOIN user_anons u USING (project_id, anonymous_id)
WHERE e.project_id = '...'
GROUP BY u.user_id;
```

This works for **multi-device** users too — every device's `anonymous_id` is in `user_anons` because they all eventually called identify.

### 3b. Pre-login behaviour for one user

What was `user_42` doing before they signed up?

```sql
WITH their_anons AS (
  SELECT DISTINCT anonymous_id
  FROM events
  WHERE project_id = '...' AND user_id = 'user_42'
)
SELECT
  timestamp,
  event_name,
  source,
  medium,
  path,
  JSONExtractString(properties, 'utm_campaign') AS campaign
FROM events
WHERE project_id = '...'
  AND anonymous_id IN (SELECT anonymous_id FROM their_anons)
  AND user_id IS NULL                      -- only the pre-identify ones
ORDER BY timestamp;
```

### 3c. Time-to-conversion per user

```sql
WITH stitched AS (
  SELECT
    u.user_id,
    min(e.timestamp) AS first_seen,
    minIf(e.timestamp, e.event_name = 'signup') AS signed_up_at,
    minIf(e.timestamp, e.event_name = 'completed_checkout') AS first_purchase_at
  FROM events e
  JOIN (
    SELECT DISTINCT user_id, anonymous_id
    FROM events
    WHERE project_id = '...' AND user_id IS NOT NULL
  ) u USING (project_id, anonymous_id)
  WHERE e.project_id = '...'
  GROUP BY u.user_id
)
SELECT
  user_id,
  first_seen,
  signed_up_at,
  dateDiff('hour', first_seen, signed_up_at)        AS hours_to_signup,
  dateDiff('day',  signed_up_at, first_purchase_at) AS days_to_first_purchase
FROM stitched
WHERE signed_up_at IS NOT NULL;
```

## Multi-device

Every device the user signs in on emits its own `$identify`, linking that device's `anonymous_id` to `user_id`. The "anonymous\_id ↔ user\_id" relation becomes many-to-one over time, which is exactly what you want.

```sql
SELECT
  user_id,
  groupArray(DISTINCT anonymous_id) AS devices,
  count(DISTINCT anonymous_id) AS device_count
FROM events
WHERE project_id = '...'
  AND user_id IS NOT NULL
  AND timestamp > now() - INTERVAL 90 DAY
GROUP BY user_id
HAVING device_count > 1
ORDER BY device_count DESC;
```

## When NOT to call identify

| Moment                          | Identify?                                                             |
| ------------------------------- | --------------------------------------------------------------------- |
| Anonymous browsing              | no                                                                    |
| User signs up (creates account) | **yes**                                                               |
| User logs in (returning)        | **yes** — confirms link on this device                                |
| User logs out                   | no — keep tracking events anonymously, but don't reset `anonymous_id` |
| User switches account           | **yes** — call identify with the new `user_id`                        |

Calling identify on every page is fine. It's idempotent at the data level — you'll just have more `$identify` events.

## Common pitfalls

* **Generating a new `anonymous_id` on logout.** Severs the link. Leave it alone.
* **Calling `identify(undefined)` or `identify("")` on logout.** Ditto.
* **Calling `/v1/identify` from a `pk_*` key without `Origin` header.** Same CORS rules as `/v1/track`. Add the origin to the allowlist or use `sk_*`.
* **Expecting historical events to be rewritten.** They aren't. The stitch is *always* a query-time join. That's fine — ClickHouse joins on `(project_id, anonymous_id)` are very fast.

## See also

* [Identities](/core-concepts/identities.md) — concepts.
* [POST /v1/identify](/api-reference/identify.md) — the endpoint.
* [Sessions](/core-concepts/sessions.md) — entry-source per visit.
* [GDPR right-to-be-forgotten](/recipes/gdpr-delete.md) — what `forget` does to anonymous events.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.millimetric.ai/recipes/anonymous-to-known.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
