CS-11030: cross-process transpile coalesce via module_transpile_cache#4755
Merged
lukemelia merged 6 commits intoMay 12, 2026
Merged
Conversation
Contributor
Preview deploymentsHost Test Results 1 files 1 suites 2h 3m 48s ⏱️ Results for commit 66ead90. Realm Server Test Results 1 files ±0 1 suites ±0 11m 8s ⏱️ - 2m 37s Results for commit 66ead90. ± Comparison against earlier commit 62a2c38. |
5e2aa2b to
53669fe
Compare
635b61b to
98e454c
Compare
53669fe to
e687df6
Compare
feb1e11 to
333b135
Compare
e687df6 to
53f4120
Compare
333b135 to
bd24b79
Compare
Lifts Realm.#moduleCache from a purely in-memory L1 to a two-layer
L1+L2 system. The new L2 is a Postgres-backed module_transpile_cache
table (UNLOGGED, RAM-backed, like the modules definition cache) keyed
on (realm_url, canonical_path) so peer realm-servers in the same fleet
can re-read a transpile produced by any other peer instead of each
running babel independently.
Schema: nullable body/headers/dependency_keys (so tombstones can sit
in the row without bytes) + a `generation` BIGINT NOT NULL DEFAULT 0
column that protects the L2 write path against the invalidate-during-
transpile race. CS-11028's generation guard covers L1; the durable
column extends the same protocol to L2 across peers.
When a fresh transpile is needed:
1. Read module_transpile_cache. If a peer (or this process on an
earlier request) produced bytes, return them with no babel.
Otherwise capture the row's `generation` (or 0 if absent) — that
value becomes the OCC token for the eventual write.
2. With a coordinator: tryAcquireAndRun on coalesceKey
`transpile|<realmURL>|<canonicalPath>`. The winner re-reads the
row (peer may have written between the miss and the win), then
transpiles + UPSERTs with the captured generation, then emits
NOTIFY on the same MODULE_CACHE_POPULATED_CHANNEL CS-10953 already
uses for definition-cache coalesce. Losers waitForKey + re-read;
on missed NOTIFY they fall through to a local transpile + persist.
3. Without a coordinator (sqlite / in-memory deployments): direct
transpile + OCC-protected L2 persist.
L2 writes are guarded by the OCC clause `ON CONFLICT DO UPDATE WHERE
existing.generation <= EXCLUDED.generation`. An invalidate that lands
during the transpile bumps the row past the captured value, so the
WHERE rejects the UPSERT and a stale transpile started before the
invalidate cannot resurrect the row. Mirrors CS-11028's L1 guard with
a durable counter visible to every peer.
Invalidation paths — invalidateCache, writeMany, delete/deleteAll,
the local file-watcher callback, handleExecutableInvalidations, the
full-index clear — already route through the CS-11028 helpers. Those
now fire-and-forget a tombstone-and-bump UPSERT against
module_transpile_cache (body=NULL, generation += 1) alongside the
in-memory drop. Physical DELETE would let a slow in-flight writer's
INSERT succeed unconflicted and resurrect the stale bytes; the
tombstone leaves a non-null `generation` for the OCC WHERE clause to
compare against. Fire-and-forget is acceptable because every peer's
listener runs the same tombstone for its own copy (self-healing on
transient pg blips) and the bump is idempotent.
L2 persist is best-effort by design: a transient pg failure is
logged via realm.#log.warn but doesn't surface to the caller — the
caller already has the bytes in memory.
Wiring: main.ts passes the existing moduleCacheCoordinator as
transpileCoordinator (the same instance powers both the prerender
coalesce and the transpile coalesce — coalesceKey prefixes keep the
two flows separate). Tests cover the L2 read/write/tombstone plumbing
and directly exercise the OCC WHERE clause; the coalesce semantics
themselves are covered by the CS-10953 ModuleCacheCoordinator tests
already shipped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bd24b79 to
82d1990
Compare
The new migration in this PR added module_transpile_cache to Postgres but the host-side SQLite schema (used by browser tests) wasn't regenerated, so #readTranspileCacheRow's SELECT threw SQLITE_ERROR: no such table on every transpile and the error bubbled up as HTTP 406 "Module transpilation failed" for every .gts request. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes two follow-ups from the PR description. Carry dependency_keys through the L2 row so a cross-process L2 hit skips the extractModuleDependencyKeys AST scan that fallbackHandle's L1 write used to redo on every miss. ModuleTranspileResult now includes the Set<string> computed once at the transpile/L2-hit boundary; #materializeAndTranspile populates it; #writeTranspileCacheRow persists it as a JSON string array; #readTranspileCacheRow parses it back. fallbackHandle's L1 write site reuses result.dependencyKeys instead of calling extractModuleDependencyKeys. Rows written before this change come back as an empty Set during the rollout window — the table is UNLOGGED so pre-rollout rows age out on any pg restart. Two-instance integration test mirroring the CachingDefinitionLookup two-instance test for the transpile flow: two Realms sharing pg + dir, each with its own ModuleCacheCoordinator, exercise (a) L2 row produced by peer A is served from L2 by peer B without re-running babel and (b) concurrent transpile from both peers coalesces through the advisory-lock + NOTIFY channel — exactly one babel call across both. Plumbs an optional transpileCoordinator through createRealm so each peer's coordinator hits the same pg LISTEN channel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
testRealm.write + __testOnlyClearCaches each tombstone-and-bump the target path's L2 row during setup, so the row's generation when the test reaches its explicit tombstone INSERT is unpredictable (was 5 when CI expected 1). The OCC guard property under test — "stale write captured generation N can't UPSERT when the row is past N" — doesn't depend on starting from gen=0. Capture the gen before the explicit tombstone, assert the tombstone bumped it, run the stale write, assert the gen stayed at the post- tombstone value. The semantic property still holds; the test is now robust to setup fan-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two CI fixes:
1. Lint (qunit/no-assert-logical-expression at line 887): split the
compound `postTombstoneGen != null && postTombstoneGen > preTombstoneGen`
assertion into a `notStrictEqual(..., undefined)` followed by an
`assert.ok` on the delta. Same semantic check; satisfies the
single-condition assertion rule.
2. Flaky test ("identity-checked cleanup: A in-flight + invalidate +
B in-flight + A settles → B survives"): `waitForInflight(1)`
confirms only that the in-flight slot is set, which happens BEFORE
the transpile hook fires. Under CI load A could still be racing
toward the hook when the test ran `currentGate = gateB`, so when
A finally invoked the closure `() => currentGate` it captured
`gateB` instead of `gateA`. `releaseA()` then released a gate A
wasn't parked at, A's request never responded, and the test timed
out at 60s with "socket hang up." Replace the mutable reference
with call-index routing inside the hook and synchronize on
explicit per-call entry signals so the order is deterministic.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
backspace
approved these changes
May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
module_transpile_cachetable (UNLOGGED, RAM-backed, like themodulesdefinition cache). Keyed on(realm_url, canonical_path). Columns: nullable body/headers/dependency_keys (so tombstones can sit in the row without bytes),generationBIGINT NOT NULL DEFAULT 0, created_at.Realm.#moduleCachefrom purely in-memory L1 to L1+L2. L1 stays in-process for hot-path latency; L2 is the cross-process shared layer.generationat the L2 read step (or 0 if absent), transpiles, then UPSERTs with that captured value viaON CONFLICT DO UPDATE WHERE existing.generation <= EXCLUDED.generation. An invalidate that lands during the transpile bumps the row past the captured value, so the writer's UPSERT is rejected by the WHERE clause and a stale transpile started before the invalidate cannot resurrect the row.#deleteTranspileCacheRowUPSERTsbody=NULL, generation = generation+1rather than physically DELETEing. Physical DELETE would let a slow in-flight writer's INSERT succeed (no conflict, no row to compare against) and resurrect the stale bytes. Bulk wipe (#deleteAllTranspileCacheRowsfrom__testOnlyClearCaches) likewise UPDATEs all rows to tombstones with bumped gen.ModuleCacheCoordinator(CS-10953) andMODULE_CACHE_POPULATED_CHANNEL. The coalesce key prefix (transpile|...vs the CachingDefinitionLookup shape) keeps the two flows separate; waiters dispatch off the bounded int64 hash, so crosstalk is a benign hash miss.main.tspasses the existingmoduleCacheCoordinatorastranspileCoordinatorto every Realm it constructs.The CS-11028 generation guard protects the in-memory L1 write; the new DB-resident
generationcolumn protects the L2 write. Together they close both layers of the invalidate-during-transpile race.Linear: https://linear.app/cardstack/issue/CS-11030
Test plan
Realm.#moduleCache L2 module_transpile_cache (DB-backed)module:module_transpile_cacheinvalidateCachetombstones the L2 row and bumps generationModuleCacheCoordinatortests still pass — the coordinator is unchanged; we just route a second flow through it.realm-endpoints-test.tsstill pass (etag, 304, content-type).Follow-ups (NOT in this PR)
module-cache-coordination-test.tsfor the transpile flow (gated babel + advisory-lock contention → exactly one transpile across both instances).#deleteAllTranspileCacheRowsbumps existing rows but doesn't tombstone paths that didn't yet have a row, so an in-flight writer for one of those paths could still resurrect. Narrow, only reachable from__testOnlyClearCaches.extractModuleDependencyKeysrecomputation in fallbackHandle.🤖 Generated with Claude Code