> For the complete documentation index, see [llms.txt](https://docs.millimetric.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.millimetric.ai/core-concepts/attribution.md).

# Attribution & the classifier

Every event runs through a **pure, ordered classifier** that maps `(url, referrer, request_host)` into a normalised `(source, medium, campaign, confidence, rule_id)` tuple. The classifier is deterministic, has no I/O, and runs server-side on every ingest.

## The output

```json
{
  "source": "facebook",
  "medium": "paid",
  "campaign": "spring_launch",
  "source_confidence": "high",
  "source_rule_id": "fb_ad_redirect"
}
```

* `source` — the network or origin (`facebook`, `google`, `twitter`, `direct`, `internal`, etc.).
* `medium` — how the visit was driven (`paid`, `organic`, `social`, `email`, `referral`, `direct`).
* `confidence` — `high` / `medium` / `low`. A signal for downstream filters.
* `rule_id` — the exact rule that fired. Stored on the row so you can audit historical data and re-classify if the rules change.

## The rule cascade

First match wins. The full table:

| #  | Rule fires when…                                                                              | Result                                     | Confidence   |
| -- | --------------------------------------------------------------------------------------------- | ------------------------------------------ | ------------ |
| 1  | `gclid` in URL                                                                                | `google` / `paid`                          | high         |
| 2  | `msclkid`                                                                                     | `bing` / `paid`                            | high         |
| 3  | `ttclid`                                                                                      | `tiktok` / `paid`                          | high         |
| 4  | `li_fat_id`                                                                                   | `linkedin` / `paid`                        | high         |
| 5  | `fbclid` + referrer host is `l.facebook.com` / `lm.facebook.com`                              | `facebook` / `paid`                        | high         |
| 6  | `fbclid` + `utm_source` \~ facebook/instagram/meta                                            | `facebook` or `instagram` / `paid`         | high         |
| 7  | \`utm\_medium=cpc                                                                             | paid                                       | paid\_social |
| 8  | `utm_medium` ∈ paid set + `utm_source` \~ instagram                                           | `instagram` / `paid`                       | high         |
| 9  | `utm_medium` ∈ paid set + any `utm_source`                                                    | `<utm_source>` / `paid`                    | high         |
| 10 | `utm_medium` ∈ paid set, no `utm_source`                                                      | `unknown` / `paid`                         | medium       |
| 11 | `fbclid` alone, no other context                                                              | `facebook` / `paid`                        | **medium**   |
| 12 | `utm_source` explicit, no paid signal                                                         | `<utm_source>` / `<utm_medium ?? unknown>` | high         |
| 13 | Referrer host ∈ {facebook.com, m.facebook.com, l.facebook.com, …} without `fbclid`            | `facebook` / `social`                      | medium       |
| 14 | Referrer host ∈ {instagram.com, l.instagram.com}                                              | `instagram` / `social`                     | medium       |
| 15 | Referrer is a known social network (twitter, x, linkedin, reddit, tiktok, youtube, pinterest) | `<network>` / `social`                     | medium       |
| 16 | Referrer is a known search engine (google.\*, bing.\*, duckduckgo, …)                         | `<engine>` / `organic`                     | medium       |
| 17 | Referrer is an email-client / ESP                                                             | `email` / `email`                          | medium       |
| 18 | Referrer host == page host                                                                    | `internal` / `direct`                      | high         |
| 19 | No referrer + no UTM + no click ID                                                            | `direct` / `direct`                        | high         |
| 20 | Anything else                                                                                 | `<host slug>` / `referral`                 | **low**      |

## The Facebook social-vs-paid resolution (the headline)

This is the case we obsess over because it's where most tools blur the line:

| You see                                     | Most analytics show | Millimetric shows              |
| ------------------------------------------- | ------------------- | ------------------------------ |
| `?fbclid=abc` arriving via `l.facebook.com` | Facebook (mixed)    | **facebook / paid** (high)     |
| `?utm_source=facebook&utm_medium=cpc`       | Facebook            | **facebook / paid** (high)     |
| `?fbclid=abc` from anywhere else            | Facebook (mixed)    | **facebook / paid** (medium)   |
| Referrer `facebook.com`, no `fbclid`        | Facebook            | **facebook / social** (medium) |
| Referrer `m.facebook.com`, no `fbclid`      | Facebook            | **facebook / social** (medium) |

Same for `instagram.com` ↔ `utm_source=instagram&utm_medium=paid_social`.

## Why `confidence`

Some rules are inherently softer than others. `gclid` is *only* emitted by Google Ads — high confidence. `fbclid` alone *usually* means a paid click but can leak onto organic shares — medium confidence. A novel referrer we've never seen — low confidence.

In dashboards or downstream pipelines, you can filter on `source_confidence` to exclude noisy attribution when you want clean revenue-attribution numbers.

## Why `rule_id`

Stored on every event row. If the rules change in v2 and you want to re-attribute historical data, you can identify exactly which rule fired on each row and apply targeted updates instead of re-classifying everything.

## What it does *not* do

* **No fingerprinting.** No IP-based linkage, no canvas, no fonts. Just `(url, referrer)`.
* **No probabilistic attribution.** The classifier is rule-based and deterministic.
* **No machine learning.** Rule data lives in `packages/classifier/src/rules.ts` and is unit-tested against \~31 scenarios in `classify.test.ts`.

## Per-event vs. per-session attribution

The classifier evaluates *each event individually*. So a backend `signup` event POSTed without `url`/`referrer` will classify as `direct` — that's per-event truth.

For "where did this visitor enter from", query the **`sessions_v`** view instead. It captures the first-touch source/medium per `(anonymous_id, 30-min window)` and is the right answer for revenue attribution.

```sql
SELECT entry_source, entry_medium, count() AS sessions
FROM sessions_v
GROUP BY entry_source, entry_medium
ORDER BY sessions DESC;
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.millimetric.ai/core-concepts/attribution.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
