fix(replay): stop polling preload-as-style <link> elements forever#3667
Conversation
Session recorder treated <link rel="preload" as="style" href="*.css"> as if it
were a stylesheet and waited for link.sheet to populate. Per spec preload
links never instantiate a CSSStyleSheet, so the wait timed out, recursively
re-serialized the link, scheduled another wait, and leaked a 'load' listener
on every cycle - multiplying further on every real load event. Pages with
Next.js-style CSS preloads accumulated thousands of active polling chains,
saturating the main thread and freezing the tab on refocus.
- Drop preload from the predicate so only rel=stylesheet schedules a wait.
- Replace the recursive serializeNodeWithId call with the serializedNode
already in scope.
- Track tracked links in a WeakSet so repeat calls are no-ops; gate the load
handler on the same 'fired' guard as the timer; pass { once: true } so the
listener self-removes if a load event ever fires.
Added jsdom unit tests in rrweb-snapshot and a real-browser Playwright spec
that loads a page with five preload-as-style links, instruments
HTMLLinkElement.prototype.addEventListener, and asserts the count stays
bounded across timer cycles plus dispatched load events. Without the fix
the spec sees ~30 leaked listeners; with the fix it sees zero.
Generated-By: PostHog Code
Task-Id: 18dbe2b5-9a25-4b4c-a756-db029804f620
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This comment has been minimized.
This comment has been minimized.
|
Size Change: +6.09 kB (+0.04%) Total Size: 16.4 MB
ℹ️ View Unchanged
|
pauldambra
left a comment
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
Multi-perspective review: qa-team (specialists + generalists), paul-reviewer, xp-reviewer, security-audit
Verdict: ⚠️ REQUEST CHANGES
One convergent HIGH correctness finding (paul + qa-team/reliability): the new code reuses the pre-load serializedNode instead of re-running serializeNodeWithId on load, which silently drops _cssText for late-loading external stylesheets — StylesheetManager.attachLinkElement gates the mutation on '_cssText' in attributes, so the CSS never reaches the replay.
Key findings
- 🟠 HIGH (convergent: paul + qa-team/reliability) —
snapshot.ts:1301— late-loading<link rel="stylesheet">loses_cssTextin the replay because the old recursiveserializeNodeWithIdis gone. - 🟡 MEDIUM (convergent: paul + xp + qa-team/generalist-b) — Playwright spec — assertion bound
toBeLessThanOrEqual(5)is too loose; should betoBe(0)(preload links now bypassonceStylesheetLoadedentirely). - 🟡 MEDIUM (convergent: paul + xp + qa-team/generalist-a) —
snapshot.test.ts— three near-identical tests; project conventions prefer parameterized tests, and none of them assert that_cssTextreaches theonStylesheetLoadpayload. - 🟡 MEDIUM (qa-team/reliability + paul) — module-level
stylesheetLoadTrackedWeakSet not reset across recorder lifecycles. - 🟡 MEDIUM (qa-team/compatibility) — preload-as-style branch removed; confirm no replay was relying on it for CSS capture.
- 🟢 LOW (xp) — triple idempotency (WeakSet +
{once:true}+fired) reads as belt-and-braces; consider trimming. - ⚪ NIT — WeakSet name, playground comment, changeset wording.
Convergence
_cssTextregression: paul + qa-team/reliability (highest confidence).- Playwright assertion looseness: paul + xp + qa-team/generalist-b.
- Test parameterization + missing
_cssTextpayload assertion: paul + xp + qa-team/generalist-a. - WeakSet cross-lifecycle: paul + qa-team/reliability.
Reviewer summaries
| Reviewer | Assessment |
|---|---|
| 🔍 qa-team | REQUEST CHANGES — leak fix is correct, but the recursive-call removal regresses CSS capture for late-loading stylesheets. |
| 👤 paul | REQUEST CHANGES — solid trace-driven write-up, but wants the late-loading _cssText path either restored or covered by a test before stamping. |
| 📐 xp | APPROVE WITH NITS — the predicate simplification is good XP work; idempotency triple-guard and the un-parameterised tests are the cleanups he'd want. |
| 🛡 security-audit | APPROVE — strictly shrinks attack surface; no new sinks, no new user-input paths, no exploitable change. |
Automated by QA Swarm — not a human review
… replay qa-swarm found a convergent HIGH-severity regression: dropping the recursive serializeNodeWithId call inside the load callback meant late-loading <link rel="stylesheet"> elements would deliver their original pre-load serializedNode to onStylesheetLoad — which has no _cssText, so StylesheetManager.attachLinkElement (gated on '_cssText' in attributes) never emitted the cssText mutation. Every replay with a stylesheet not loaded by first snapshot would render unstyled. The new WeakSet idempotency guard makes restoring the recursive call safe: re-entry into onceStylesheetLoaded sees the link in the WeakSet and returns early, so the chain cannot re-arm. Also from qa-swarm: - Parameterize the three new unit tests and add a fourth case that populates link.sheet between first serialize and load, asserting _cssText reaches onStylesheetLoad (regression coverage for the bug this commit fixes). - Tighten the Playwright assertion from toBeLessThanOrEqual(5) to toBe(0) — preload-as-style links bypass onceStylesheetLoaded entirely now. - HTML comment in the leak-repro playground explaining the CSS chunks are intentionally non-existent. Generated-By: PostHog Code Task-Id: 18dbe2b5-9a25-4b4c-a756-db029804f620
The fix lives in packages/rrweb/rrweb-snapshot/src/snapshot.ts so the rrweb-snapshot package needs the same patch bump as posthog-js. Flagged by the changeset-hygiene action. Generated-By: PostHog Code Task-Id: 18dbe2b5-9a25-4b4c-a756-db029804f620
…sets
The module-level stylesheetLoadTracked WeakSet persisted for the document
lifetime, so a recorder stop/restart on the same page (SPA toggling
session recording) silently skipped re-tracking any <link rel="stylesheet">
whose load was still pending from the previous lifecycle. The pending
{ once: true } listener was closed over the stopped recorder's
onStylesheetLoad, so the second recorder never received the _cssText
mutation when the sheet finally loaded.
Pre-fix this didn't bite because there was no skip — each call attached a
fresh listener (which was the leak being fixed). The WeakSet now resets
inside StylesheetManager.reset() so the second recorder sees a clean
tracker.
Added a jsdom test that exercises the cross-lifecycle path: serialize a
pending link, call resetStylesheetLoadTracking, serialize it again,
populate sheet, dispatch load — assert the second session receives a
_cssText payload.
Note: the WeakSet still lives in rrweb-snapshot rather than in
StylesheetManager. Moving it into the recorder would couple the snapshot
package to load-tracking state it does not own; threading the WeakSet
through serializeNodeWithId options is the cleaner long-term refactor.
For now the reset hook is the minimal fix.
Generated-By: PostHog Code
Task-Id: 18dbe2b5-9a25-4b4c-a756-db029804f620
pauldambra
left a comment
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
Incremental review of ff5f56d5 (Option A: reset hook for stylesheetLoadTracked)
Verdict: 💬 APPROVE WITH NITS
Fix correctly addresses the previously deferred cross-lifecycle finding. Reset is wired into StylesheetManager.reset(), mirrors the existing resetMaxDepthState precedent in the same file, and the new jsdom test genuinely exercises the fix (without the reset, the second-session capture would never fire). xp and security-audit returned no findings.
Key findings
- 🟢 LOW (qa-team/reliability) —
StylesheetManager.reset()runs fromtakeFullSnapshoton every checkout, not only on recorder stop/restart. Pending links can therefore acquire a second{ once: true }listener at each checkout, emitting duplicateattachLinkElementmutations when the sheet finally loads. Sameid, same_cssText— not corrupt, just churn. Tracking-ticket worthy. - ⚪ NIT (paul + qa-team/compatibility) — module-level
let stylesheetLoadTrackedis fine because the only reader (onceStylesheetLoaded) resolves the binding at call time, and no captured references exist. Unusual for this file; flagging for future readers. - ⚪ NIT (paul) — new test assertion
expect(firstSessionCalls).toBeLessThanOrEqual(1)is hand-wavy; the actual expected value is0sinceloadis dispatched only after the reset. - ⚪ NIT (qa-team/reliability) — new cross-lifecycle test only exercises the
load-event delivery path. Timer fallback is covered bynoMultiplicationCasesbut not specifically across reset.
Reviewer summaries
| Reviewer | Assessment |
|---|---|
| 🔍 qa-team | APPROVE WITH NITS — fix is right; flagging the every-checkout reset as follow-up. |
| 👤 paul | APPROVE WITH NITS — trust on the WeakSet swap; tighten the firstSessionCalls assertion. |
| 📐 xp | APPROVE — small, surgical, mirrors resetMaxDepthState precedent. No findings. |
| 🛡 security-audit | APPROVE — module-level reset hook only; no new sinks or user-input paths. No findings. |
Automated by QA Swarm — not a human review
qa-swarm v2 flagged toBeLessThanOrEqual(1) as hand-wavy. The actual value
is 1, not 0 — even after resetStylesheetLoadTracking(), the first session's
{ once: true } load listener is still attached to the link element (reset
only clears the dedup tracker, not the listeners themselves). When the
sheet finally loads, both session 1's and session 2's listeners fire.
Pinning to toBe(1) documents this trade-off explicitly. The cost is one
duplicate attachLinkElement mutation per checkout-pending link (same id,
same _cssText — applied idempotently by the replayer). Tracked separately
as a follow-up to abort previous-snapshot listeners on reset.
Generated-By: PostHog Code
Task-Id: 18dbe2b5-9a25-4b4c-a756-db029804f620
CI lint failed: my multi-line function signature broke prettier's printWidth. Auto-fixed with `prettier --write`. Generated-By: PostHog Code Task-Id: 18dbe2b5-9a25-4b4c-a756-db029804f620
pauldambra
left a comment
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
Incremental review of b74d3926 (Map<…, {timer, onLoad}> → Map<…, AbortController>)
Verdict: 💬 APPROVE WITH NITS
3ed123d + b74d392 together resolve the previous LOW-severity duplicate-mutation concern. Pending watches now carry an AbortController, resetStylesheetLoadTracking() aborts each, and the { signal }-bound load listener is removed declaratively. Test assertion flip from 1 to 0 documents the desired behaviour (previous session's listener is genuinely gone after reset).
Findings (all NIT)
- ⚪ paul —
controller.abort()infire()'sfinallyreads as belt-and-braces; worth confirming it's intentional. - ⚪ paul — module-level mutable state +
reset()aborting in-flight waits has the #1708 concern shape; flagged for awareness. - ⚪ xp — using
AbortControllerfor cleanup is indirect compared to acleanup()closure that doesclearTimeout + removeEventListener. Trade-off, not a bug. - ⚪ xp —
try/finallyintent ("always tear down even if listener throws") could be named via extractedcleanupclosure. - ⚪ paul + xp (convergent) — test name
re-tracks a pending stylesheet link after resetStylesheetLoadTrackingis fine; a more behaviour-focused name ("does not invoke first-session listener after reset") would protect regression intent. - ⚪ paul — changeset addition for
@posthog/rrweb: positive note.
qa-team and security-audit: no findings.
Reviewer summaries
| Reviewer | Assessment |
|---|---|
| 🔍 qa-team | APPROVE — refactor is sound, semantics correct, AbortController scoping right. |
| 👤 paul | APPROVE WITH NITS — high trust, small change, ship it. |
| 📐 xp | APPROVE WITH NITS — refactor is a genuine lifecycle simplification; nits are about expressiveness. |
| 🛡 security-audit | APPROVE — internal helper, no new sinks, no new untrusted input. |
Automated by QA Swarm — not a human review
qa-swarm convergent NIT (paul + xp): the old name described the mechanism
("re-tracks a pending stylesheet link after resetStylesheetLoadTracking")
not the behaviour the test guards. The new name leads with what the
test asserts — that the previous session's listener is torn down — so
a future reader sees the regression intent without diffing the body.
Generated-By: PostHog Code
Task-Id: 18dbe2b5-9a25-4b4c-a756-db029804f620
… tracking End-to-end coverage of the listener-teardown-on-reset fix in a real browser. The spec: - Loads a page with no stylesheet in HTML. - Holds the CSS response via a Playwright route. - Injects a <link rel="stylesheet"> after recording starts (mutation observer picks it up). - Configures session_recording.full_snapshot_interval_millis to 1500ms so rrweb's takeFullSnapshot fires repeatedly, each one calling StylesheetManager.reset() -> resetStylesheetLoadTracking(). Each checkout reschedules a fresh load watch on the still-pending link. - Releases the CSS only after several checkouts have happened. - Asserts the captured event stream contains exactly one _cssText attribute mutation for the link's mirror id. Verified the spec catches the regression: with resetStylesheetLoadTracking neutered to skip the controller.abort() loop, the spec sees 3 _cssText mutations (one per checkout listener that wasn't torn down). With the abort restored, exactly 1. Generated-By: PostHog Code Task-Id: 18dbe2b5-9a25-4b4c-a756-db029804f620
CI showed the spec failing on Firefox and WebKit with Received: 0 — those browsers do not populate link.sheet.cssRules from a Playwright-route- fulfilled CSS response within our wait window. The end-to-end load delivery doesn't complete in time, so the mutation is never emitted. The fix being verified is JS-internal (Map+AbortController teardown on reset) and browser-agnostic. The jsdom unit tests cover the logic deterministically. The Playwright spec adds Chromium end-to-end confirmation; non-Chromium coverage would need a different fixture strategy (e.g., serving the CSS via a real test server instead of route interception) and isn't worth the complexity here. Generated-By: PostHog Code Task-Id: 18dbe2b5-9a25-4b4c-a756-db029804f620

Problem
A customer reported (with a detailed trace) that the session recorder was saturating the main thread on pages with
<link rel="preload" as="style" href="*.css">elements — common in Next.js-chunked apps. They measured ~1,440 concurrently active polling chains on a single preload link after ~2 hours, firing ~408/s and consuming ~55% of the main thread. Backgrounding the tab + refocusing produced multi-hundred-ms freezes as Chrome flushed the queued timers.Root cause is in
rrweb-snapshot'sonceStylesheetLoadedpath, which three defects compose into a runaway:rel="preload" as="style"the same asrel="stylesheet", but per spec preload links never instantiate aCSSStyleSheet—link.sheetstaysnullforever, so the timeout always fires.onceStylesheetLoadednever removes itsloadlistener, has no idempotency guard, and the load handler is not gated byfired— so every real load event multiplies the chain.serializeNodeWithIdon the same link, scheduling another chain every cycle.Upstream rrweb has the same bug (rrweb-io/rrweb#1707, still open; fix attempt #1708 closed unmerged).
Changes
packages/rrweb/rrweb-snapshot/src/snapshot.ts:rel="preload"from the stylesheet predicate. Preload links are still serialized normally by the surrounding pass — we just stop scheduling a never-ending wait on them.serializeNodeWithIdcall inside the load callback with the already-computedserializedNodefrom the enclosing scope.onceStylesheetLoadedidempotent via a module-levelWeakSet; share onefiredflag between the timer path and the load path; pass{ once: true }so the listener self-removes after firing.Tests:
packages/rrweb/rrweb-snapshot/test/snapshot.test.ts— three new jsdom unit tests covering preload non-tracking, stylesheet load-firing-once, and synthetic-load-event no-multiplication.packages/browser/playwright/mocked/session-recording/session-recording-preload-link-leak.spec.ts+packages/browser/playground/preload-link-leak/index.html— real-browser repro that instrumentsHTMLLinkElement.prototype.addEventListener, lets the recorder run for two 6-second windows with a syntheticloaddispatch in between, and asserts the load-add count stays bounded. Without the fix: 30 leaked listeners. With the fix: 0.Release info Sub-libraries affected
Libraries affected
Checklist
If releasing new changes
pnpm changesetto generate a changeset fileCreated with PostHog Code