Skip to content

CS-11030: cross-process transpile coalesce via module_transpile_cache#4755

Merged
lukemelia merged 6 commits into
mainfrom
cs-11030-cross-process-transpile-coalesce-for-realmmodulecache
May 12, 2026
Merged

CS-11030: cross-process transpile coalesce via module_transpile_cache#4755
lukemelia merged 6 commits into
mainfrom
cs-11030-cross-process-transpile-coalesce-for-realmmodulecache

Conversation

@lukemelia
Copy link
Copy Markdown
Contributor

@lukemelia lukemelia commented May 11, 2026

Summary

  • New module_transpile_cache table (UNLOGGED, RAM-backed, like the modules definition cache). Keyed on (realm_url, canonical_path). Columns: nullable body/headers/dependency_keys (so tombstones can sit in the row without bytes), generation BIGINT NOT NULL DEFAULT 0, created_at.
  • Lifts Realm.#moduleCache from purely in-memory L1 to L1+L2. L1 stays in-process for hot-path latency; L2 is the cross-process shared layer.
  • OCC writes. The writer captures the row's generation at the L2 read step (or 0 if absent), transpiles, then UPSERTs with that captured value via ON CONFLICT DO UPDATE WHERE existing.generation <= EXCLUDED.generation. An invalidate that lands during the transpile bumps the row past the captured value, so the writer's UPSERT is rejected by the WHERE clause and a stale transpile started before the invalidate cannot resurrect the row.
  • Tombstone-and-bump on invalidate. #deleteTranspileCacheRow UPSERTs body=NULL, generation = generation+1 rather than physically DELETEing. Physical DELETE would let a slow in-flight writer's INSERT succeed (no conflict, no row to compare against) and resurrect the stale bytes. Bulk wipe (#deleteAllTranspileCacheRows from __testOnlyClearCaches) likewise UPDATEs all rows to tombstones with bumped gen.
  • Read path: L1 miss → in-flight dedup (CS-11029) → L2 read → coordinator winner/loser → babel → L2 OCC-protected persist.
  • Coordinator: reuses the existing ModuleCacheCoordinator (CS-10953) and MODULE_CACHE_POPULATED_CHANNEL. The coalesce key prefix (transpile|... vs the CachingDefinitionLookup shape) keeps the two flows separate; waiters dispatch off the bounded int64 hash, so crosstalk is a benign hash miss.
  • Invalidation: every CS-11028 invalidation site (writeMany, delete/deleteAll, file-watcher callback, handleExecutableInvalidations cascade, full-index clear, public invalidateCache) now fire-and-forget tombstones the matching L2 row alongside the in-memory drop. Every peer's listener runs the same tombstone-and-bump on its own copy → self-healing on transient pg blips and idempotent on the bump.
  • main.ts passes the existing moduleCacheCoordinator as transpileCoordinator to every Realm it constructs.

The CS-11028 generation guard protects the in-memory L1 write; the new DB-resident generation column protects the L2 write. Together they close both layers of the invalidate-during-transpile race.

Linear: https://linear.app/cardstack/issue/CS-11030

Test plan

  • CI passes the Realm.#moduleCache L2 module_transpile_cache (DB-backed) module:
    • fresh transpile populates module_transpile_cache
    • L2 row serves a subsequent reader after L1 wipe (no re-transpile)
    • invalidateCache tombstones the L2 row and bumps generation
    • in-flight transpile that completes after invalidate cannot resurrect the L2 row (direct OCC guard exercise)
  • Existing CS-11028 + CS-11029 race/dedup tests still pass.
  • Existing CS-10953 ModuleCacheCoordinator tests still pass — the coordinator is unchanged; we just route a second flow through it.
  • Module-cache regression tests in realm-endpoints-test.ts still pass (etag, 304, content-type).

Follow-ups (NOT in this PR)

  • Two-instance integration test mirroring module-cache-coordination-test.ts for the transpile flow (gated babel + advisory-lock contention → exactly one transpile across both instances).
  • Bulk-wipe edge case: #deleteAllTranspileCacheRows bumps existing rows but doesn't tombstone paths that didn't yet have a row, so an in-flight writer for one of those paths could still resurrect. Narrow, only reachable from __testOnlyClearCaches.
  • Periodic tombstone GC if the table grows materially (UNLOGGED, but still bounded only by total path cardinality).
  • Carry dependency_keys through the L2 write so cross-process load skips the extractModuleDependencyKeys recomputation in fallbackHandle.

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Preview deployments

Host Test Results

    1 files      1 suites   2h 3m 48s ⏱️
2 658 tests 2 643 ✅ 15 💤 0 ❌
2 677 runs  2 662 ✅ 15 💤 0 ❌

Results for commit 66ead90.

Realm Server Test Results

    1 files  ±0      1 suites  ±0   11m 8s ⏱️ - 2m 37s
1 334 tests +7  1 334 ✅ +8  0 💤 ±0  0 ❌  - 1 
1 413 runs  +7  1 413 ✅ +8  0 💤 ±0  0 ❌  - 1 

Results for commit 66ead90. ± Comparison against earlier commit 62a2c38.

@lukemelia lukemelia force-pushed the cs-11029-in-process-inflight-transpile-dedup-for-realmmodulecache branch from 5e2aa2b to 53669fe Compare May 11, 2026 12:05
@lukemelia lukemelia force-pushed the cs-11030-cross-process-transpile-coalesce-for-realmmodulecache branch from 635b61b to 98e454c Compare May 11, 2026 12:07
@lukemelia lukemelia force-pushed the cs-11029-in-process-inflight-transpile-dedup-for-realmmodulecache branch from 53669fe to e687df6 Compare May 11, 2026 12:19
@lukemelia lukemelia force-pushed the cs-11030-cross-process-transpile-coalesce-for-realmmodulecache branch 2 times, most recently from feb1e11 to 333b135 Compare May 11, 2026 12:37
@lukemelia lukemelia force-pushed the cs-11029-in-process-inflight-transpile-dedup-for-realmmodulecache branch from e687df6 to 53f4120 Compare May 11, 2026 16:16
@lukemelia lukemelia force-pushed the cs-11030-cross-process-transpile-coalesce-for-realmmodulecache branch from 333b135 to bd24b79 Compare May 11, 2026 16:17
Lifts Realm.#moduleCache from a purely in-memory L1 to a two-layer
L1+L2 system. The new L2 is a Postgres-backed module_transpile_cache
table (UNLOGGED, RAM-backed, like the modules definition cache) keyed
on (realm_url, canonical_path) so peer realm-servers in the same fleet
can re-read a transpile produced by any other peer instead of each
running babel independently.

Schema: nullable body/headers/dependency_keys (so tombstones can sit
in the row without bytes) + a `generation` BIGINT NOT NULL DEFAULT 0
column that protects the L2 write path against the invalidate-during-
transpile race. CS-11028's generation guard covers L1; the durable
column extends the same protocol to L2 across peers.

When a fresh transpile is needed:
  1. Read module_transpile_cache. If a peer (or this process on an
     earlier request) produced bytes, return them with no babel.
     Otherwise capture the row's `generation` (or 0 if absent) — that
     value becomes the OCC token for the eventual write.
  2. With a coordinator: tryAcquireAndRun on coalesceKey
     `transpile|<realmURL>|<canonicalPath>`. The winner re-reads the
     row (peer may have written between the miss and the win), then
     transpiles + UPSERTs with the captured generation, then emits
     NOTIFY on the same MODULE_CACHE_POPULATED_CHANNEL CS-10953 already
     uses for definition-cache coalesce. Losers waitForKey + re-read;
     on missed NOTIFY they fall through to a local transpile + persist.
  3. Without a coordinator (sqlite / in-memory deployments): direct
     transpile + OCC-protected L2 persist.

L2 writes are guarded by the OCC clause `ON CONFLICT DO UPDATE WHERE
existing.generation <= EXCLUDED.generation`. An invalidate that lands
during the transpile bumps the row past the captured value, so the
WHERE rejects the UPSERT and a stale transpile started before the
invalidate cannot resurrect the row. Mirrors CS-11028's L1 guard with
a durable counter visible to every peer.

Invalidation paths — invalidateCache, writeMany, delete/deleteAll,
the local file-watcher callback, handleExecutableInvalidations, the
full-index clear — already route through the CS-11028 helpers. Those
now fire-and-forget a tombstone-and-bump UPSERT against
module_transpile_cache (body=NULL, generation += 1) alongside the
in-memory drop. Physical DELETE would let a slow in-flight writer's
INSERT succeed unconflicted and resurrect the stale bytes; the
tombstone leaves a non-null `generation` for the OCC WHERE clause to
compare against. Fire-and-forget is acceptable because every peer's
listener runs the same tombstone for its own copy (self-healing on
transient pg blips) and the bump is idempotent.

L2 persist is best-effort by design: a transient pg failure is
logged via realm.#log.warn but doesn't surface to the caller — the
caller already has the bytes in memory.

Wiring: main.ts passes the existing moduleCacheCoordinator as
transpileCoordinator (the same instance powers both the prerender
coalesce and the transpile coalesce — coalesceKey prefixes keep the
two flows separate). Tests cover the L2 read/write/tombstone plumbing
and directly exercise the OCC WHERE clause; the coalesce semantics
themselves are covered by the CS-10953 ModuleCacheCoordinator tests
already shipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lukemelia lukemelia force-pushed the cs-11030-cross-process-transpile-coalesce-for-realmmodulecache branch from bd24b79 to 82d1990 Compare May 11, 2026 17:56
@lukemelia lukemelia changed the base branch from cs-11029-in-process-inflight-transpile-dedup-for-realmmodulecache to main May 11, 2026 18:39
lukemelia and others added 5 commits May 11, 2026 14:46
The new migration in this PR added module_transpile_cache to Postgres
but the host-side SQLite schema (used by browser tests) wasn't
regenerated, so #readTranspileCacheRow's SELECT threw
SQLITE_ERROR: no such table on every transpile and the error bubbled
up as HTTP 406 "Module transpilation failed" for every .gts request.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes two follow-ups from the PR description.

Carry dependency_keys through the L2 row so a cross-process L2 hit
skips the extractModuleDependencyKeys AST scan that fallbackHandle's
L1 write used to redo on every miss. ModuleTranspileResult now
includes the Set<string> computed once at the transpile/L2-hit
boundary; #materializeAndTranspile populates it; #writeTranspileCacheRow
persists it as a JSON string array; #readTranspileCacheRow parses it
back. fallbackHandle's L1 write site reuses result.dependencyKeys
instead of calling extractModuleDependencyKeys. Rows written before
this change come back as an empty Set during the rollout window — the
table is UNLOGGED so pre-rollout rows age out on any pg restart.

Two-instance integration test mirroring the CachingDefinitionLookup
two-instance test for the transpile flow: two Realms sharing pg + dir,
each with its own ModuleCacheCoordinator, exercise (a) L2 row produced
by peer A is served from L2 by peer B without re-running babel and
(b) concurrent transpile from both peers coalesces through the
advisory-lock + NOTIFY channel — exactly one babel call across both.
Plumbs an optional transpileCoordinator through createRealm so each
peer's coordinator hits the same pg LISTEN channel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
testRealm.write + __testOnlyClearCaches each tombstone-and-bump the
target path's L2 row during setup, so the row's generation when the
test reaches its explicit tombstone INSERT is unpredictable (was 5
when CI expected 1). The OCC guard property under test — "stale write
captured generation N can't UPSERT when the row is past N" — doesn't
depend on starting from gen=0.

Capture the gen before the explicit tombstone, assert the tombstone
bumped it, run the stale write, assert the gen stayed at the post-
tombstone value. The semantic property still holds; the test is now
robust to setup fan-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two CI fixes:

1. Lint (qunit/no-assert-logical-expression at line 887): split the
   compound `postTombstoneGen != null && postTombstoneGen > preTombstoneGen`
   assertion into a `notStrictEqual(..., undefined)` followed by an
   `assert.ok` on the delta. Same semantic check; satisfies the
   single-condition assertion rule.

2. Flaky test ("identity-checked cleanup: A in-flight + invalidate +
   B in-flight + A settles → B survives"): `waitForInflight(1)`
   confirms only that the in-flight slot is set, which happens BEFORE
   the transpile hook fires. Under CI load A could still be racing
   toward the hook when the test ran `currentGate = gateB`, so when
   A finally invoked the closure `() => currentGate` it captured
   `gateB` instead of `gateA`. `releaseA()` then released a gate A
   wasn't parked at, A's request never responded, and the test timed
   out at 60s with "socket hang up." Replace the mutable reference
   with call-index routing inside the hook and synchronize on
   explicit per-call entry signals so the order is deterministic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lukemelia lukemelia marked this pull request as ready for review May 12, 2026 16:45
@lukemelia lukemelia requested review from a team and habdelra May 12, 2026 16:45
@lukemelia lukemelia merged commit 04332de into main May 12, 2026
102 of 103 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants