For the complete documentation index, see llms.txt. This page is also available as Markdown.

Sessions

How session_id is derived, when to override it, and what the sessions view does for you.

A session is a span of activity from a single visitor with no more than 30 minutes of idle time between events. Sessions exist mainly to answer one question well: what brought this person in?

In Millimetric, sessions are a derived concept. The raw events table has a session_id per row; the sessions materialised view aggregates them.

How session_id is derived

If you don't send session_id, the server fills it in:

session_id = `${anonymous_id}-${30min_bucket}`

30min_bucket = floor(unix_seconds / 1800)

So a single visitor's events get the same session_id as long as they happen inside the same 30-minute window. As soon as the gap exceeds ~30 minutes, the bucket increments and the visitor is in a new session.

This is intentionally simple:

  • Stateless. No server-side timer per visitor; sessions fall out of timestamps + anonymous id.

  • Deterministic. The same input produces the same session_id whether it's processed live or replayed from a backfill.

  • Cheap. The classifier and ingest path don't read or write session state.

The trade-off: edges of the bucket can split or merge sessions slightly differently than a "30 min since last event" sliding window. For the analytics queries Millimetric optimises for, the difference is rounding noise.

Overriding session_id

You should override only when you need session boundaries to mean something specific in your product. For example:

  • A long-form video player where you want a "watch session" to span hours.

  • A multi-step onboarding that you want grouped even across reloads.

  • An AI agent where each task_id is a session for telemetry purposes.

The browser SDK doesn't expose a setSessionId — the auto-derived one is what you want for web. Override only via the HTTP layer.

The sessions materialised view

sessions is a ClickHouse AggregatingMergeTree view that groups events by (project_id, session_id, anonymous_id) and stores:

Column
What it is

session_id

derived or user-supplied

anonymous_id

the visitor

user_id

filled in if /v1/identify happened during the session

started_at

min(timestamp)

ended_at

max(timestamp)

duration_s

ended_at - started_at

event_count

count of all events in the session

pageview_count

count of $pageview events

entry_source / entry_medium / entry_campaign

classifier output on the first event of the session

entry_url / entry_path / entry_referrer

from the first event

country / device_type / browser / os

first event's enrichment

Why entry_source is the right answer for attribution

Per-event attribution is per-event truth: a server-side signup POSTed without url or referrer will classify as direct. That's correct for that event, but it's the wrong answer for "where did this customer come from".

The entry_source on a session is the first-touch attribution for that visit. Query it instead of raw events when you want to credit the channel that brought someone in.

This is what powers /v1/sources — the endpoint that gives you Facebook paid and Facebook social on separate rows.

Pre-login → post-login in a session

If a visitor identifies during a session, the sessions row sees both:

  • anonymous_id = u_abc for the whole session

  • user_id = user_42 because at least one event in the session carried it (the $identify itself)

  • entry_source / entry_medium from the first event (which was anonymous)

Result: the session is correctly credited to the channel that brought them in, with the user id stitched on.

Multiple sessions per visitor

Common. Treat them independently for attribution purposes:

Multi-touch attribution (e.g. credit each channel proportionally) is a query-layer concern — Millimetric stores raw and first-touch; you compose more sophisticated models on top.

Sessions for the FB social-vs-paid split

Combining sessions + the classifier is the whole point.

That breakdown is the answer to the question every marketer asks and almost no analytics tool answers cleanly.

What sessions don't do

  • No cross-device stitching. A visitor on a phone and a laptop has two different anonymous_ids and therefore different sessions. Use user_id (post-identify) to roll those up.

  • No session-level revenue. Attach amount_cents to the conversion event itself; aggregate per-session in your query.

  • No engagement scoring out of the box. event_count, pageview_count, duration_s give you the raw material — score it however you like.

See also

Last updated

Was this helpful?