For the complete documentation index, see llms.txt. This page is also available as Markdown.

Attribution & the classifier

How the classifier turns a URL + referrer into (source, medium, confidence).

Every event runs through a pure, ordered classifier that maps (url, referrer, request_host) into a normalised (source, medium, campaign, confidence, rule_id) tuple. The classifier is deterministic, has no I/O, and runs server-side on every ingest.

The output

{
  "source": "facebook",
  "medium": "paid",
  "campaign": "spring_launch",
  "source_confidence": "high",
  "source_rule_id": "fb_ad_redirect"
}
  • source — the network or origin (facebook, google, twitter, direct, internal, etc.).

  • medium — how the visit was driven (paid, organic, social, email, referral, direct).

  • confidencehigh / medium / low. A signal for downstream filters.

  • rule_id — the exact rule that fired. Stored on the row so you can audit historical data and re-classify if the rules change.

The rule cascade

First match wins. The full table:

#
Rule fires when…
Result
Confidence

1

gclid in URL

google / paid

high

2

msclkid

bing / paid

high

3

ttclid

tiktok / paid

high

4

li_fat_id

linkedin / paid

high

5

fbclid + referrer host is l.facebook.com / lm.facebook.com

facebook / paid

high

6

fbclid + utm_source ~ facebook/instagram/meta

facebook or instagram / paid

high

7

`utm_medium=cpc

paid

paid_social

8

utm_medium ∈ paid set + utm_source ~ instagram

instagram / paid

high

9

utm_medium ∈ paid set + any utm_source

<utm_source> / paid

high

10

utm_medium ∈ paid set, no utm_source

unknown / paid

medium

11

fbclid alone, no other context

facebook / paid

medium

12

utm_source explicit, no paid signal

<utm_source> / <utm_medium ?? unknown>

high

13

Referrer host ∈ {facebook.com, m.facebook.com, l.facebook.com, …} without fbclid

facebook / social

medium

14

Referrer host ∈ {instagram.com, l.instagram.com}

instagram / social

medium

15

Referrer is a known social network (twitter, x, linkedin, reddit, tiktok, youtube, pinterest)

<network> / social

medium

16

Referrer is a known search engine (google.*, bing.*, duckduckgo, …)

<engine> / organic

medium

17

Referrer is an email-client / ESP

email / email

medium

18

Referrer host == page host

internal / direct

high

19

No referrer + no UTM + no click ID

direct / direct

high

20

Anything else

<host slug> / referral

low

The Facebook social-vs-paid resolution (the headline)

This is the case we obsess over because it's where most tools blur the line:

You see
Most analytics show
Millimetric shows

?fbclid=abc arriving via l.facebook.com

Facebook (mixed)

facebook / paid (high)

?utm_source=facebook&utm_medium=cpc

Facebook

facebook / paid (high)

?fbclid=abc from anywhere else

Facebook (mixed)

facebook / paid (medium)

Referrer facebook.com, no fbclid

Facebook

facebook / social (medium)

Referrer m.facebook.com, no fbclid

Facebook

facebook / social (medium)

Same for instagram.comutm_source=instagram&utm_medium=paid_social.

Why confidence

Some rules are inherently softer than others. gclid is only emitted by Google Ads — high confidence. fbclid alone usually means a paid click but can leak onto organic shares — medium confidence. A novel referrer we've never seen — low confidence.

In dashboards or downstream pipelines, you can filter on source_confidence to exclude noisy attribution when you want clean revenue-attribution numbers.

Why rule_id

Stored on every event row. If the rules change in v2 and you want to re-attribute historical data, you can identify exactly which rule fired on each row and apply targeted updates instead of re-classifying everything.

What it does not do

  • No fingerprinting. No IP-based linkage, no canvas, no fonts. Just (url, referrer).

  • No probabilistic attribution. The classifier is rule-based and deterministic.

  • No machine learning. Rule data lives in packages/classifier/src/rules.ts and is unit-tested against ~31 scenarios in classify.test.ts.

Per-event vs. per-session attribution

The classifier evaluates each event individually. So a backend signup event POSTed without url/referrer will classify as direct — that's per-event truth.

For "where did this visitor enter from", query the sessions_v view instead. It captures the first-touch source/medium per (anonymous_id, 30-min window) and is the right answer for revenue attribution.

Last updated

Was this helpful?