Skip to content

Conversation

@Lokimorty
Copy link
Contributor

@Lokimorty Lokimorty commented Dec 3, 2025

Summary

Links anonymous browser sessions to authenticated user identities, enabling unified user journey tracking across login boundaries. Solves the "logged-out anonymous session → logged-in session" tracking gap, providing complete funnel visibility and accurate visitor deduplication. Addressing issue #3820

What Changed

Client-side:

  • Persistent visitor ID in localStorage (enabled by default, opt-out via data-identity-stitching="false")
  • Graceful degradation when localStorage unavailable (Safari private browsing)

Server-side:

  • New identity_link table linking visitors to distinct IDs (authenticated users)
  • Updated getWebsiteStats to deduplicate by resolved identity
  • Query patterns optimized for both PostgreSQL and ClickHouse

Database:

  • Prisma schema: IdentityLink model with createdAt/linkedAt timestamps
  • ClickHouse: ReplacingMergeTree with FINAL keyword for deduplication
  • Visitor ID field added to events and stats tables

Design Decisions

Hybrid approach: Client-side visitor persistence + server-side identity linking

  • Visitor ID: generated once per browser, persists via localStorage
  • Identity link: created when identify() called with authenticated user ID
  • Deduplication: stats queries join through identity_link to count distinct identities

Multi-account support:

  • One visitor → multiple distinct_ids (user logs into different accounts)
  • One distinct_id → multiple visitors (user on multiple devices)
  • Links are additive and never invalidated (preserves historical journey)

Database asymmetry (intentional):

  • PostgreSQL: normalized schema, identity_link joined through session table
  • ClickHouse: denormalized visitor_id in website_event for query performance

Edge Cases

  • Safari private browsing — localStorage throws, visitorId undefined, no link created
  • localStorage cleared — new visitorId generated, creates new link
  • Multiple tabs — same visitorId shared via localStorage
  • Multiple devices — one visitor can link to multiple distinct_ids
  • Retroactive attribution — historical anonymous session credited correctly

Test Plan

  • Enable feature (default enabled)
  • Anonymous pageview → confirm visitor_id in events
  • Call umami.identify('user1') → confirm identity_link created
  • Check stats: 1 visitor (deduplicated by resolved identity)
  • Log out, browse → stats still show 1 visitor
  • Disable feature with data-identity-stitching="false" → no visitor_id
  • Test Safari private browsing → no errors, gracefully handles
  • Verify ClickHouse: identity_link table populated, FINAL keyword works
  • Cross-device test: identify on device A, stats on device B
  • Funnel analysis: anonymous → login → conversion shows complete journey

…e#3820)

Adds automatic session linking/identity stitching to link anonymous
browsing sessions with authenticated user sessions.

## Changes

### Database Schema
- Add `identity_link` table (PostgreSQL + ClickHouse) to store mappings
  between visitor IDs and authenticated user IDs
- Add `visitor_id` field to `Session` model
- Add `visitor_id` column to ClickHouse `website_event` table

### Client Tracker
- Generate and persist `visitor_id` in localStorage
- Include `vid` in all tracking payloads
- Support opt-out via `data-identity-stitching="false"` attribute

### API
- Accept `vid` parameter in `/api/send` endpoint
- Auto-create identity links when `identify()` is called with both
  visitor_id and distinct_id
- Store visitor_id in sessions and events

### Query Updates
- Update `getWebsiteStats` to deduplicate visitors by resolved identity
- Visitors who browse anonymously then log in are now counted as one user

## Usage

When a user logs in, call `umami.identify(userId)`. If identity stitching
is enabled (default), the tracker automatically links the anonymous
visitor_id to the authenticated userId. Stats queries then resolve
linked identities to accurately count unique visitors.

Resolves umami-software#3820
…mi-software#3820)

Links anonymous browser sessions to authenticated user identities, enabling unified
user journey tracking across login boundaries. This solves the "logged-out anonymous
session → logged-in session" tracking gap, providing complete funnel visibility and
accurate visitor deduplication.

## Changes

- Client-side: Persistent visitor ID in localStorage (data-identity-stitching attribute)
- Server-side: identity_link table linking visitors to distinct IDs (authenticated users)
- Query updates: getWebsiteStats now deduplicates by resolved identity
- Graceful degradation: Works in Safari private browsing and when localStorage unavailable

## Implementation Details

Uses hybrid approach combining client-side persistence with server-side linking:
- Visitor ID generated once per browser, persists across sessions
- When user logs in, identify() creates identity link
- stats queries join through identity_link to deduplicate cross-device sessions

Both PostgreSQL and ClickHouse supported with appropriate query patterns:
- PostgreSQL: normalized schema, joins through session table
- ClickHouse: denormalized with ReplacingMergeTree for deduplication

## Edge Cases Handled

- Safari private browsing: localStorage throws, visitorId undefined, no link created
- localStorage cleared: new visitorId generated, creates new link
- Multiple tabs: same visitorId shared via localStorage
- Multiple devices: one visitor can link to multiple distinct_ids
- Multiple accounts: one distinct_id can link to multiple visitors

## Test Plan

- [ ] Enable feature on test website (default enabled)
- [ ] Anonymous pageview - confirm visitor_id in events table
- [ ] Call umami.identify('user1') - confirm identity_link created
- [ ] Stats show 1 visitor (deduplicated)
- [ ] Log out, browse anonymously, stats still show 1 visitor
- [ ] Test with data-identity-stitching="false" - no visitor_id collected
- [ ] Test in Safari private browsing - no errors, gracefully skips
- [ ] Test ClickHouse: verify identity_link table populated and FINAL keyword works
- [ ] Verify retroactive: historical anonymous session attributed correctly
@vercel
Copy link

vercel bot commented Dec 3, 2025

@Lokimorty is attempting to deploy a commit to the umami-software Team on Vercel.

A member of the Team first needs to authorize it.

@Lokimorty Lokimorty changed the title feat: implement automatic session linking and identity stitching (#3820) feat: implement automatic session linking and identity stitching Dec 3, 2025
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 3, 2025

Greptile Overview

Greptile Summary

This PR implements identity stitching to link anonymous browser sessions with authenticated user identities across login boundaries. The implementation adds a persistent visitor_id in localStorage (client-side) and a new identity_link table (server-side) to track relationships between visitors and authenticated users.

Major Changes:

  • Client-side: persistent visitor ID generation with graceful fallback for Safari private browsing
  • Database: new identity_link table (Prisma + ClickHouse) with visitor_id fields added to session/event tables
  • Server: identity link creation on identify() calls, visitor deduplication in stats queries
  • Query optimization: uses LEFT JOIN with identity_link for visitor resolution

Critical Issue Found:
The visitor counting logic in getWebsiteStats.ts has a critical bug that will cause visitor inflation. The query uses coalesce(resolved_identity, session_id) which counts visitors by different identifiers before vs after identify() is called, resulting in the same physical person being counted as 2+ visitors. The fix requires adding visitor_id to the coalesce chain: coalesce(distinct_id, visitor_id, session_id).

What Works Well:

  • Proper upsert pattern prevents duplicate identity links
  • Fire-and-forget identity link creation won't block tracking requests
  • ClickHouse uses ReplacingMergeTree with FINAL keyword correctly
  • Client gracefully handles localStorage exceptions (Safari private mode)
  • Database schema changes are comprehensive with proper indexes

Confidence Score: 1/5

  • This PR has a critical visitor counting bug that will cause incorrect analytics data
  • Score reflects a critical logical error in the core stats query that defeats the purpose of the feature - visitor deduplication will not work correctly until the coalesce chain includes visitor_id as a fallback. Without this fix, users will see inflated visitor counts (same person counted multiple times as they transition from anonymous to authenticated). The rest of the implementation is solid, but this bug must be fixed before merge.
  • Pay close attention to src/queries/sql/getWebsiteStats.ts - the visitor counting logic needs immediate correction across all three query paths (Postgres raw events, ClickHouse raw events, ClickHouse hourly aggregates)

Important Files Changed

File Analysis

Filename Score Overview
src/queries/sql/getWebsiteStats.ts 1/5 critical visitor counting bug - double-counts users before/after login, needs visitor_id fallback in coalesce chain
src/tracker/index.js 5/5 client-side visitor ID generation with localStorage persistence, proper fallback for Safari private browsing
src/app/api/send/route.ts 4/5 adds visitor_id handling to tracking endpoint, fire-and-forget identity link creation properly catches errors
src/queries/sql/identity/createIdentityLink.ts 5/5 upsert pattern for identity links with proper Prisma/ClickHouse implementations, handles idempotency correctly
prisma/schema.prisma 5/5 adds IdentityLink model with proper indexes, visitor_id field on Session with index for efficient lookups
db/clickhouse/schema.sql 5/5 adds visitor_id column to event tables and identity_link table with ReplacingMergeTree, proper materialized view updates

Sequence Diagram

sequenceDiagram
    participant Browser
    participant localStorage
    participant Tracker
    participant API
    participant DB
    
    Note over Browser,DB: Initial Anonymous Visit
    Browser->>Tracker: Load umami tracker
    Tracker->>localStorage: getItem('umami.visitor')
    localStorage-->>Tracker: null (first visit)
    Tracker->>Tracker: Generate UUID (visitorId)
    Tracker->>localStorage: setItem('umami.visitor', visitorId)
    Tracker->>API: POST /api/send {type: 'event', vid: visitorId}
    API->>DB: createSession(visitorId, sessionId)
    API->>DB: saveEvent(visitorId, sessionId)
    
    Note over Browser,DB: User Continues Browsing (Anonymous)
    Browser->>Tracker: Page view
    Tracker->>localStorage: getItem('umami.visitor')
    localStorage-->>Tracker: visitorId (persisted)
    Tracker->>API: POST /api/send {type: 'event', vid: visitorId}
    API->>DB: saveEvent(visitorId, sessionId)
    
    Note over Browser,DB: User Logs In / Identifies
    Browser->>Tracker: umami.identify('user-123')
    Tracker->>localStorage: getItem('umami.visitor')
    localStorage-->>Tracker: visitorId (same as before)
    Tracker->>API: POST /api/send {type: 'identify', vid: visitorId, id: 'user-123'}
    API->>DB: saveSessionData(distinctId: 'user-123')
    API->>DB: createIdentityLink(visitorId, 'user-123')
    Note right of DB: Links anonymous visitorId<br/>to authenticated user-123
    
    Note over Browser,DB: Post-Login Activity
    Browser->>Tracker: Page view
    Tracker->>localStorage: getItem('umami.visitor')
    localStorage-->>Tracker: visitorId (same visitor)
    Tracker->>API: POST /api/send {type: 'event', vid: visitorId, id: 'user-123'}
    API->>DB: saveEvent(visitorId, 'user-123')
    
    Note over Browser,DB: Stats Query (Deduplication)
    API->>DB: getWebsiteStats()
    DB->>DB: JOIN identity_link ON visitor_id
    DB->>DB: COUNT DISTINCT resolved_identity
    DB-->>API: visitors: 1 (deduplicated)
    Note right of DB: Pre-login and post-login<br/>sessions counted as 1 visitor
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Fixes visitor inflation bug where same person was counted twice:
- Once as session_id (before identify)
- Once as distinct_id (after identify)

The coalesce chain now uses visitor_id as fallback before session_id,
ensuring consistent counting across the identify() boundary.
@Lokimorty
Copy link
Contributor Author

fixed the visitor counting issue flagged by the bot and resolved the comments

@Lokimorty Lokimorty changed the title feat: implement automatic session linking and identity stitching feat: automatic session linking and identity stitching Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant