-
Notifications
You must be signed in to change notification settings - Fork 6k
feat: automatic session linking and identity stitching #3825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…e#3820) Adds automatic session linking/identity stitching to link anonymous browsing sessions with authenticated user sessions. ## Changes ### Database Schema - Add `identity_link` table (PostgreSQL + ClickHouse) to store mappings between visitor IDs and authenticated user IDs - Add `visitor_id` field to `Session` model - Add `visitor_id` column to ClickHouse `website_event` table ### Client Tracker - Generate and persist `visitor_id` in localStorage - Include `vid` in all tracking payloads - Support opt-out via `data-identity-stitching="false"` attribute ### API - Accept `vid` parameter in `/api/send` endpoint - Auto-create identity links when `identify()` is called with both visitor_id and distinct_id - Store visitor_id in sessions and events ### Query Updates - Update `getWebsiteStats` to deduplicate visitors by resolved identity - Visitors who browse anonymously then log in are now counted as one user ## Usage When a user logs in, call `umami.identify(userId)`. If identity stitching is enabled (default), the tracker automatically links the anonymous visitor_id to the authenticated userId. Stats queries then resolve linked identities to accurately count unique visitors. Resolves umami-software#3820
…mi-software#3820) Links anonymous browser sessions to authenticated user identities, enabling unified user journey tracking across login boundaries. This solves the "logged-out anonymous session → logged-in session" tracking gap, providing complete funnel visibility and accurate visitor deduplication. ## Changes - Client-side: Persistent visitor ID in localStorage (data-identity-stitching attribute) - Server-side: identity_link table linking visitors to distinct IDs (authenticated users) - Query updates: getWebsiteStats now deduplicates by resolved identity - Graceful degradation: Works in Safari private browsing and when localStorage unavailable ## Implementation Details Uses hybrid approach combining client-side persistence with server-side linking: - Visitor ID generated once per browser, persists across sessions - When user logs in, identify() creates identity link - stats queries join through identity_link to deduplicate cross-device sessions Both PostgreSQL and ClickHouse supported with appropriate query patterns: - PostgreSQL: normalized schema, joins through session table - ClickHouse: denormalized with ReplacingMergeTree for deduplication ## Edge Cases Handled - Safari private browsing: localStorage throws, visitorId undefined, no link created - localStorage cleared: new visitorId generated, creates new link - Multiple tabs: same visitorId shared via localStorage - Multiple devices: one visitor can link to multiple distinct_ids - Multiple accounts: one distinct_id can link to multiple visitors ## Test Plan - [ ] Enable feature on test website (default enabled) - [ ] Anonymous pageview - confirm visitor_id in events table - [ ] Call umami.identify('user1') - confirm identity_link created - [ ] Stats show 1 visitor (deduplicated) - [ ] Log out, browse anonymously, stats still show 1 visitor - [ ] Test with data-identity-stitching="false" - no visitor_id collected - [ ] Test in Safari private browsing - no errors, gracefully skips - [ ] Test ClickHouse: verify identity_link table populated and FINAL keyword works - [ ] Verify retroactive: historical anonymous session attributed correctly
|
@Lokimorty is attempting to deploy a commit to the umami-software Team on Vercel. A member of the Team first needs to authorize it. |
Greptile OverviewGreptile SummaryThis PR implements identity stitching to link anonymous browser sessions with authenticated user identities across login boundaries. The implementation adds a persistent Major Changes:
Critical Issue Found: What Works Well:
Confidence Score: 1/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Browser
participant localStorage
participant Tracker
participant API
participant DB
Note over Browser,DB: Initial Anonymous Visit
Browser->>Tracker: Load umami tracker
Tracker->>localStorage: getItem('umami.visitor')
localStorage-->>Tracker: null (first visit)
Tracker->>Tracker: Generate UUID (visitorId)
Tracker->>localStorage: setItem('umami.visitor', visitorId)
Tracker->>API: POST /api/send {type: 'event', vid: visitorId}
API->>DB: createSession(visitorId, sessionId)
API->>DB: saveEvent(visitorId, sessionId)
Note over Browser,DB: User Continues Browsing (Anonymous)
Browser->>Tracker: Page view
Tracker->>localStorage: getItem('umami.visitor')
localStorage-->>Tracker: visitorId (persisted)
Tracker->>API: POST /api/send {type: 'event', vid: visitorId}
API->>DB: saveEvent(visitorId, sessionId)
Note over Browser,DB: User Logs In / Identifies
Browser->>Tracker: umami.identify('user-123')
Tracker->>localStorage: getItem('umami.visitor')
localStorage-->>Tracker: visitorId (same as before)
Tracker->>API: POST /api/send {type: 'identify', vid: visitorId, id: 'user-123'}
API->>DB: saveSessionData(distinctId: 'user-123')
API->>DB: createIdentityLink(visitorId, 'user-123')
Note right of DB: Links anonymous visitorId<br/>to authenticated user-123
Note over Browser,DB: Post-Login Activity
Browser->>Tracker: Page view
Tracker->>localStorage: getItem('umami.visitor')
localStorage-->>Tracker: visitorId (same visitor)
Tracker->>API: POST /api/send {type: 'event', vid: visitorId, id: 'user-123'}
API->>DB: saveEvent(visitorId, 'user-123')
Note over Browser,DB: Stats Query (Deduplication)
API->>DB: getWebsiteStats()
DB->>DB: JOIN identity_link ON visitor_id
DB->>DB: COUNT DISTINCT resolved_identity
DB-->>API: visitors: 1 (deduplicated)
Note right of DB: Pre-login and post-login<br/>sessions counted as 1 visitor
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
11 files reviewed, 3 comments
Fixes visitor inflation bug where same person was counted twice: - Once as session_id (before identify) - Once as distinct_id (after identify) The coalesce chain now uses visitor_id as fallback before session_id, ensuring consistent counting across the identify() boundary.
|
fixed the visitor counting issue flagged by the bot and resolved the comments |
Summary
Links anonymous browser sessions to authenticated user identities, enabling unified user journey tracking across login boundaries. Solves the "logged-out anonymous session → logged-in session" tracking gap, providing complete funnel visibility and accurate visitor deduplication. Addressing issue #3820
What Changed
Client-side:
data-identity-stitching="false")Server-side:
identity_linktable linking visitors to distinct IDs (authenticated users)getWebsiteStatsto deduplicate by resolved identityDatabase:
Design Decisions
Hybrid approach: Client-side visitor persistence + server-side identity linking
identify()called with authenticated user IDMulti-account support:
Database asymmetry (intentional):
Edge Cases
Test Plan
umami.identify('user1')→ confirm identity_link createddata-identity-stitching="false"→ no visitor_id