Skip to content

runtime split: Engine/Observer/Puppeteer, schema-v3 restored, P4C liveness proven#54

Merged
kavinsood merged 12 commits into
mainfrom
p0-salvage-telemetry-split
May 30, 2026
Merged

runtime split: Engine/Observer/Puppeteer, schema-v3 restored, P4C liveness proven#54
kavinsood merged 12 commits into
mainfrom
p0-salvage-telemetry-split

Conversation

@kavinsood

Copy link
Copy Markdown
Owner

Summary

This PR closes the architecture/autophagy thread: split the YAOS runtime into Engine (production), Observer (telemetry), and Puppeteer (QA harness), restore schema-v3 correctness, and prove the QA mount path works in live Obsidian.


What changed

Runtime split (P1–P4)

Layer Location Responsibility
Engine main.js Production sync — no QA controls
Observer telemetry.js Passive diagnostics — no QA controls
Puppeteer qa/obsidian-harness/ QA harness — owns __YAOS_DEBUG__ mount
  • Removed src/lab/ entirely
  • Stripped EngineControlPort from production via esbuild define
  • Removed all __qaOnly Engine seams from main.js
  • window.__YAOS_DEBUG__ is now mounted exclusively by the QA harness plugin, never by the product plugin
  • telemetry.js ships as a passive Observer — no QA controls, no Puppeteer logic

Schema-v3 restoration

  • Restored SCHEMA_VERSION = 3 in schema.ts, fileMeta.ts, vaultSync.ts, diskMirror.ts
  • Restored SERVER_MIN_SCHEMA_VERSION = 3 / SERVER_MAX_SCHEMA_VERSION = 3 in server/src/version.ts
  • Restored tombstone witness handling (markDirty(path, "tombstone")) in observeMetaChanges
  • Added scripts/guard-schema-version.mjs: 7-condition regression guard wired into test:regressions

QA harness

  • qa/obsidian-harness/ owns all Puppeteer source
  • s00-smoke-trace-export.ts: stateless required liveness smoke that proves __YAOS_DEBUG__, __YAOS_QA__, QA product build, and harness plugin are all mounted
  • run-smoke-ready.mjs: CDP controller for P4C — verifies trace file exists on disk
  • qa:smoke-ready script: build → qa product → harness → live smoke
  • s01-single-device-basic-edit.ts: unique per-run paths (prevents stale CRDT contamination)
  • typeIntoFile: replaced character-by-character replaceRange with atomic setValue(current + text) — avoids cursor-at-0 reversal in headless CDP; documented explicitly as atomic editor transaction, not human-typing simulation

Docs

  • docs/engineering/schema-version-guard.md: step-by-step v4 bump procedure

Verification

npm run build                           PASS
npm run build:qa-product                PASS
npm run build:harness                   PASS
npm run test:regressions                84/84 PASS
npm run guard:production-bundles:strict PASS (main.js 490.7 KB, telemetry.js 69.5 KB)
npm run guard:qa-isolation              PASS
npm run guard:schema-version            PASS (all 7 checks, v3)
npm run verify:bundles                  PASS
QA_VAULT_PATH=/home/kavin/temenos npm run qa:smoke-ready  PASS (live Obsidian)

P4C live smoke (all 9 checks):

PASS  waitForQaReady()
PASS  window.__YAOS_DEBUG__ exists
PASS  window.__YAOS_QA__ exists
PASS  QA product build loaded (getEngineControlPort)
PASS  Harness plugin loaded (yaos-qa-harness)
PASS  scenario smoke-trace-export passed
PASS  result.tracePath non-null
PASS  trace file exists on disk
PASS  stopTrace()

What is NOT in this PR (open follow-up issues)

s01 functional scenario: still failing

s01-single-device-basic-edit runs but does not pass. The unique-path approach confirmed the failures are product bugs, not static-path contamination. Three open RCA findings:

  1. waitForReceiptAfter takes ~18.9s despite receipt confirmed at t+0.67s — predicate stale after post-confirmation local update resets _lastCandidateId; fallback path not resolving as expected
  2. QA file opened in editor at t+0.81s before scenario calls openFile — source unknown, triggers recovery suppression and external disk write
  3. getCrdtHash(path) disagrees with qa.checkpoint: hashMismatches=0 — QA debug API and internal reconciler using different path/hash lookups

These are product/harness bugs that need a separate RCA. The architecture campaign does not depend on them.

Telemetry fidelity (deferred)

  • Per-Y.Text witness observer gap (documented in installTelemetryRuntime.ts comment)
  • DeviceWitnessTracker.markDirty labels local metadata changes as remote-apply

Architecture campaign status

Engine / Observer / Puppeteer runtime split    DONE
production main.js: no QA controls            PROVEN (strict bundle guard)
telemetry.js: passive Observer                 PROVEN
qa harness: owns Puppeteer                     PROVEN
schema v3 restored and guarded                 DONE
P4C QA liveness smoke                         PROVEN (live Obsidian)
s01 functional scenario                        OPEN RCA (product bugs, not arch)

kavinsood added 12 commits May 30, 2026 15:08
…rness

Split the mixed lab runtime into three clear layers with enforced boundaries:

  Engine (main.js)    = product sync runtime
  Observer (telemetry.js) = passive telemetry / diagnostics, shipped separately
  Puppeteer (qa/)     = mutation harness / QA driver, not shipped

Key changes:

- Add product-owned observability types (src/observability/):
  productEventKinds.ts, recoveryEventTypes.ts, traceContext.ts, traceLogger.ts
  Product code uses PRODUCT_EVENT_KIND string constants instead of
  importing FLIGHT_KIND enum from Observer internals.

- Remove old src/debug/ and src/diagnostics/ product roots:
  Canonical Observer implementations now live under src/lab/debug/ and
  src/lab/diagnostics/ (moved in earlier work, originals deleted here).

- Introduce passive telemetry runtime (src/telemetry/):
  installTelemetryRuntime.ts — Observer entry point, no mutation commands.
  telemetryRuntimeHost.ts — read-only host interface.
  Emits telemetry.js via esbuild (70 KB, fully clean of mutation symbols).

- Move Puppeteer mutation harness to qa/harness/:
  qaDebugApi.ts, installPuppeteerRuntime.ts, scenarioStateController.ts,
  vfsTortureTest.ts, ports/yaosUnsafeQaPort.ts — all out of src/.
  qa/ is tracked source; qa-runs/ (artifacts) is gitignored.

- Track pre-existing QA harness source:
  qa/analyzers/, qa/controllers/, qa/obsidian-harness/, qa/scripts/,
  qa/fixtures/ — previously hidden by qa/ gitignore rule.

- Add strict/transitional production bundle guards:
  scripts/guard-production-bundles.mjs with --transitional flag.
  Strict mode fails on known Engine test seams (__qaOnly, Unsafe, ForceSync).
  Transitional mode warns and exits 0 (PARTIAL PASS).

- Update all test imports to new boundary paths.

- Fix lint: remove unnecessary type assertions, use configDir injection,
  wrap unbound methods, use console.debug instead of console.log.

Bundle sizes: main.js ~494 KB, telemetry.js ~71 KB
Regressions: 84/84 passing
telemetry.js forbidden grep: zero mutation-harness symbols
src/ -> qa/ import violations: zero

Known debt (separate phase):
  6 __qaOnly Engine test seams remain in main.js on
  ReconciliationController and EditorBindingManager.
  Removal requires injected unsafe capability ports.
  Strict guard fails on these; transitional guard warns.
docs/architecture/runtime-estates.md:
  - Define Engine / Observer / Puppeteer estates and boundaries
  - Document what telemetry.js may/must-not contain
  - Document tracked qa/ source vs ignored qa-runs/ artifacts
  - Record 6 deferred Engine __qaOnly test seams with table (file/class)
  - Note TelemetryRuntimeHost broad-handle debt
  - Document all enforcement scripts

.github/workflows/release.yml:
  - Add telemetry.js to zip, gh release upload, gh release create
  - Every release now ships main.js + telemetry.js (Observer bundle)

.gitignore:
  - Add lab.js with explanatory comment (legacy name, must not reappear)

package.json:
  - Add guard:no-lab-artifact — fails if lab.js exists in root
  - verify:bundles now runs: build + guard:no-lab-artifact + guard:production-bundles:transitional
telemetry.js is a generated release artifact (esbuild output),
same as main.js. It was accidentally committed. Removed from tracking
and added to .gitignore alongside main.js and lab.js.
Remove all six public __qaOnly*Unsafe methods and supporting production
state from ReconciliationController and EditorBindingManager.

Architecture:
  main.ts owns private Engine control state.
  ReconciliationController registers a DiskIngestPort via optional dep.
  EditorBindingManager reads a BindingPropagationGate supplied at construction.
  qa/harness calls plugin.getEngineControlPort() — not product class methods.
  TelemetryRuntimeHost and Observer are unaware the control port exists.
  src/ does not import qa/.

New files:
  src/runtime/engineControlPort.ts — type-only EngineControlPort + DiskIngestPort

ReconciliationController changes:
  - Remove __qaExternalEditPolicyOverride field
  - Remove __qaOnlyForceSyncFileFromDiskUnsafe method
  - Remove __qaOnlyPauseEditorBindingPropagationUnsafe method
  - Remove __qaOnlyResumeEditorBindingPropagationUnsafe method
  - Remove __qaOnlySetExternalEditPolicyOverrideUnsafe method
  + Add getEffectiveExternalEditPolicy? dep (replaces override field)
  + Add registerDiskIngestPort? dep (fires at construction with private closure)

EditorBindingManager changes:
  - Remove qaPaused field from EditorBinding
  - Remove __qaOnlyPauseBindingPropagationUnsafe method
  - Remove __qaOnlyResumeBindingPropagationUnsafe method
  + Add optional BindingPropagationGate constructor parameter
  + gate.isPaused(path) replaces binding.qaPaused in health/propagation checks
  + gate.registerReconfigureHook supplies CM compartment reconfigure callback

main.ts changes:
  + Private Engine control state: diskIngestPort, externalEditPolicyOverride,
    pausedEditorPropagationPaths, bindingReconfigureHook
  + engineControlPort assembled from private closures
  + getEngineControlPort() public method for Puppeteer harness duck-typing
  + ReconciliationController and EditorBindingManager wired with control deps

qa/harness changes:
  qaDebugApi.ts: calls plugin.getEngineControlPort() instead of __qaOnly methods
  installPuppeteerRuntime.ts: passes getEngineControlPort to buildQaDebugApi
  yaosUnsafeQaPort.ts: method names updated (ingestDiskFileNow, pause/resume,
    setExternalEditPolicyOverride)
  scenarios s06a, s10g: renamed method calls
  two-device.ts: evalRaw strings updated

tests: fixtures now capture DiskIngestPort via registerDiskIngestPort and
  expose ingestDiskFileNow() helper; __qaOnlyForceSyncFileFromDiskUnsafe
  calls replaced across controller-recovery-orchestration*.ts and
  frontmatter-guard-orchestration.ts

Verification:
  npm run build              PASS
  npm run test:regressions   84/84
  npm run lint:changed       PASS
  npm run guard:qa-isolation PASS
  guard:production-bundles:strict PASS
  grep __qaOnly|Unsafe|ForceSync main.js = 0
…ARNESS_ENABLED__ esbuild define

Problem: main.js still contained getEngineControlPort, ingestDiskFileNow,
pauseEditorPropagation, resumeEditorPropagation, setExternalEditPolicyOverride
after P2. The guard only checked old __qaOnly/Unsafe/ForceSync vocabulary —
the renamed capability passed the old guard undetected.

Fix:
- esbuild.config.mjs: add define __YAOS_QA_HARNESS_ENABLED__=false for
  mainContext (production).  Dead-code elimination removes all gated blocks.
  New qa-product build mode (product-main.js) sets it to true.
- src/main.ts: replace four individual QA fields + engineControlPort eager
  literal with a single _qaState field (null in production).  All QA logic
  lives inside if (__YAOS_QA_HARNESS_ENABLED__) blocks.  getEngineControlPort
  is dynamically attached as an instance property inside the gate — it never
  appears on the class prototype in production bundles.
- src/lab/labRuntimeHost.ts: remove getEngineControlPort() — LabRuntimeHost
  is visible to Observer/telemetry and must not know the control port exists.
- qa/harness/installPuppeteerRuntime.ts: introduce local PuppeteerRuntimeHost
  extending LabRuntimeHost with getEngineControlPort().  Type boundary is now
  qa/ only.
- scripts/guard-production-bundles.mjs: add getEngineControlPort,
  pauseEditorPropagation, resumeEditorPropagation, setExternalEditPolicyOverride
  to MAIN_FORBIDDEN.  Update docstring: P2 deferred seams are done; P3 is done.
- qa/scripts/prepare-vault.ts: QA vault setup now copies product-main.js
  (the QA-enabled build) instead of the production main.js.
- package.json: add build:qa-product script.

Verification:
  npm run build                PASS
  npm run guard:production-bundles:strict  PASS
  grep capability names main.js            0 hits (4 names gone)
  npm run build:qa-product     PASS (product-main.js has capability names)
  npm run test:regressions     84/84
  npm run lint:changed         PASS
  npm run guard:qa-isolation   PASS
  npm run verify:bundles       PASS
… Observer to src/telemetry

Dead code removal:
- Delete qa/harness/installPuppeteerRuntime.ts — exported installLabRuntime
  but had zero callers anywhere in the codebase. Audit confirmed no external
  injector, no call site in src/, qa/, tests/, or scripts/.
- Delete src/lab/labRuntimeHost.ts — its only consumer was
  installPuppeteerRuntime.ts. LabRuntimeHost was the Puppeteer host interface
  and has no place in src/.
- Delete src/lab/debug/ports/index.ts and yaosDebugPort.ts — redundant shims
  pointing at src/telemetry/debug/ports/; zero importers.

Dead telemetry mount API removal:
- Remove onTelemetryApiMounted / onTelemetryApiUnmounted from TelemetryRuntimeHost.
  installTelemetryRuntime never called onTelemetryApiMounted (the mount was
  asymmetric — unmount was called, mount was not). Telemetry is passive and
  must not mount window.__YAOS_DEBUG__ — that is a Puppeteer/QA control surface,
  not a telemetry surface. The defensive delete of __YAOS_DEBUG__ in onunload
  remains sufficient.
- Remove the corresponding dead callbacks from the host object literal in main.ts.
- Remove the host.onTelemetryApiUnmounted() call from installTelemetryRuntime
  dispose() — nothing to unmount since nothing was ever mounted.

src/lab → src/telemetry rename:
- Move all passive Observer files out of the lying src/lab/ directory:
    src/lab/debug/{flightEvents,flightRecorder,flightTraceController,
                   flightTraceSink,pathIdentity,trace}.ts
    src/lab/diagnostics/{deviceWitnessTracker,diagnosticsBundle,
                         diagnosticsService,pathRedactor,witnessStateHash}.ts
  → src/telemetry/debug/ and src/telemetry/diagnostics/ respectively.
- Update all import references in src/, qa/, tests/ (53 files total).
- src/lab/ is now completely gone.

Guard updates:
- Remove installLabRuntime pattern from guard-qa-isolation.mjs TELEMETRY_FORBIDDEN
  (the file is deleted).

Verification:
  npm run build                       PASS
  npm run build:qa-product            PASS
  npm run guard:production-bundles:strict  PASS
  npm run guard:qa-isolation          PASS
  npm run verify:bundles              PASS
  npm run test:regressions            84/84
  npm run lint:changed                PASS
  rg 'lab/' src/ qa/ tests/ scripts/ → 0 (code files)
  rg 'LabRuntimeHost|installLabRuntime|onTelemetryApiMounted|onTelemetryApiUnmounted' → 0
…cker/trace accessors

Problem (found by P4B audit):
  window.__YAOS_DEBUG__ was never mounted by any in-repo code path.
  - onTelemetryApiMounted was removed (P4A) — it was already dead before P4A
    since installTelemetryRuntime never called it
  - installPuppeteerRuntime.ts was deleted (P4A) — it was a dead export
  Every QA scenario timed out at waitForQaReady() because __YAOS_DEBUG__ was
  undefined. The break predated P4A; P4A just removed the dead code hiding it.

Fix — harness Obsidian plugin is the mount point:
  qa/obsidian-harness/main.ts now calls mountYaosDebugApi() on load.
  The method:
    1. Accesses app.plugins.plugins['yaos'] (the product plugin)
    2. Accesses plugin.lab (TelemetryRuntimeHandle, private via as-any cast)
    3. Assembles a PluginHandle object delegating to product + telemetry handle
    4. Calls buildQaDebugApi(pluginHandle) from qa/harness/qaDebugApi.ts
    5. Assigns result to window.__YAOS_DEBUG__
  Cleanup: onunload deletes __YAOS_DEBUG__ in addition to __YAOS_QA__.

  This keeps the product plugin as a passive black box — it never mounts its
  own crash-test remote.  The harness is responsible for the mount.
  No src/ → qa/ imports introduced.

Telemetry handle accessors:
  Added optional getDeviceWitnessTracker?() and getFlightTraceController?() to
  TelemetryRuntimeHandle (src/telemetry/installTelemetryRuntime.ts).
  Both are read-only accessors to existing internal Observer objects.
  Optional (?), so no existing callers are affected.
  Needed by the harness PluginHandle assembly for witness primitives and
  phase-event recording in scenario runs (S11+).

ScenarioStateController:
  Harness plugin creates its own ScenarioStateController instance, passed as
  getScenarioController() on the PluginHandle. Scenario step tracking works.

Harness rebuild: qa/obsidian-harness/main.js rebuilt to include mount code.

Verification:
  npm run build                        PASS
  npm run build:qa-product             PASS
  node qa/obsidian-harness/esbuild.mjs production  PASS
  npm run guard:production-bundles:strict  PASS
  npm run guard:qa-isolation           PASS
  npm run test:regressions             84/84
  npm run lint:changed                 PASS
  grep __YAOS_DEBUG__ qa/obsidian-harness/main.js  → 8 hits (mount + cleanup)
  grep __YAOS_DEBUG__ main.js          → 1 hit (defensive delete in onunload only)
…smoke test

Fixes to P4B found by review:

Issue 1 — Mount timing / partial-mount risk:
  Added four explicit guards to mountYaosDebugApi() that fail loudly before
  any PluginHandle is assembled:
    Guard 1: product plugin 'yaos' must be loaded
    Guard 2: getEngineControlPort must exist (confirms QA product build, not
             production main.js which dead-code-eliminates the method)
    Guard 3: product.lab must exist (confirms qaDebugMode:true and that
             installTelemetryRuntime has run)
    Guard 4: lab.getDeviceWitnessTracker must be a function (confirms the
             P4B telemetry.js build, not a stale pre-P4B bundle)
  If any guard fails, no __YAOS_DEBUG__ is mounted and a loud Notice + console
  error names exactly what is wrong. waitForQaReady() will time out cleanly
  rather than letting a half-working API surface mid-scenario failures.
  Plugin load order is guaranteed by prepare-vault.ts: 'do-sync' is written
  before 'yaos-qa-harness' in community-plugins.json; Obsidian awaits each
  plugin's onload() sequentially.

Issue 2 — connectProvider semantics (CONFIRMED CORRECT, no change):
  setQaNetworkHold('offline') disconnects + blocks reconnects.
  setQaNetworkHold('online') releases hold + triggers reconnect.
  Verified in src/runtime/connectionController.ts lines 62-82.

Issue 3 — exportFlightTrace returning null (real bug, fixed):
  buildQaDebugApi throws 'Flight trace export failed' if plugin.exportFlightTrace
  returns null. Previous implementation always returned null.
  Fix: call FlightTraceController.exportTrace() directly via the
  getFlightTraceController() accessor added in P4B, using
  lab.diagnosticsService.ensureDiagnosticsDir() for the diagDir argument.
  Returns result.path if ok, null otherwise. Correctly surfaces 'no active
  trace' errors through the throw in buildQaDebugApi rather than swallowing
  them silently.

Smoke test (15/15 PASS):
  Verified with Node/jiti smoke covering all four guards and the happy path:
    Guard 1-4: each guard fires correctly, no mount occurs
    Happy path: buildQaDebugApi returns an object with all required methods
    isLocalReady() returns false for null vaultSync (correct)
    ingestDiskFileNow() delegates to getEngineControlPort() (correct)
…ectron renderer

Problem discovered during P4C live Obsidian smoke test:

  import() fails:
    Obsidian's renderer resolves dynamic import() via the app://obsidian.md
    scheme. Absolute filesystem paths outside the app bundle produce:
      TypeError: Failed to fetch dynamically imported module:
        app://obsidian.md/home/kavin/.../telemetry.js

  require() fails:
    require() loads the file but telemetry.js runs in Node's standard CJS
    module context where require('obsidian') is not available (Obsidian only
    patches require in the plugin's own module, not sub-modules). Produces:
      Error: Cannot find module 'obsidian'

Root cause:
  __dirname in Obsidian's Electron renderer is the ASAR renderer directory
  (/usr/lib/electronNN/resources/electron.asar/renderer), not the plugin
  directory. Both the old path and the corrected basePath path failed for
  different reasons depending on the load mechanism.

Fix:
  Read telemetry.js from disk with fs.readFileSync, then evaluate it with
  new Function(), passing the current require (Obsidian's patched require
  that provides 'obsidian', 'electron', etc.) as an argument. The evaluated
  code runs with the correct module resolution context.

  Plugin directory is resolved via vault adapter basePath + manifest.dir,
  which is the correct Obsidian API for this purpose.

P4C live smoke results (19/20):
  PASS  window.__YAOS_DEBUG__ is an object
  PASS  window.__YAOS_QA__ is an object
  PASS  waitForQaReady condition passes
  PASS  all required __YAOS_DEBUG__ methods present (14 checks)
  PASS  isLocalReady() returns boolean
  PASS  getConnectionState() returns string
  PASS  ingestDiskFileNow accessible (engine control port confirmed)
  PASS  getDiskHash on absent path returns null
  PASS  scenario registry has 38 scenarios
  FAIL  waitForIdle(5000) timed out — expected, vault not connected to
        a live server in this session; not a code defect

Verification:
  npm run build                        PASS
  npm run build:qa-product             PASS
  npm run guard:production-bundles:strict  PASS
  npm run test:regressions             84/84
  npm run lint:changed                 PASS
  Live Obsidian smoke                  19/20 (1 environmental non-issue)
…ctor

Root cause:
  The P1 refactor (3776255 'refactor(runtime): split Engine, Observer telemetry,
  and Puppeteer harness') accidentally reverted the schema v3 sync implementation
  that was introduced in 86b1de3 ('feat(sync): schema v3 nested Y.Map metadata').

  The deployed server at kavin-yaos.ripplor.workers.dev has SERVER_MIN_SCHEMA_VERSION=3.
  The plugin was reverted to SCHEMA_VERSION=2, causing the compatibility guard to
  block all sync with: 'This server requires schema version 3 or newer.'
  Status bar showed 'CRDT: Error'.

Restored files:
  src/sync/schema.ts (new)        — SCHEMA_VERSION = 3 constant, Obsidian-free
  src/sync/fileMeta.ts (restored) — unified v2/v3 dual-shape metadata helpers:
                                    decodeFileMeta, getMetaPath, ensureNestedMetaEntry,
                                    createNestedActiveMeta/DeletedMeta, buildMetaSnapshot,
                                    computeMetaSemanticChanges, observeMetaChanges API
  src/sync/vaultSync.ts (restored) — imports SCHEMA_VERSION from schema.ts (v3),
                                    adds observeMetaChanges() subscription API,
                                    _metaDeepObserver with incremental diff,
                                    MetaSemanticChange dispatch, markSchemaV3()
  src/sync/diskMirror.ts (restored) — uses observeMetaChanges subscription instead
                                    of shallow meta.observe; handles v3 nested Y.Map
                                    mutations for disk ops; consumeRemoteRename for
                                    analyzer remoteOrigin exemption

Updated for P1/P2/P3/P4 path changes:
  src/sync/vaultSync.ts    — ../debug/trace → ../observability/traceContext
                             ../debug/flightEvents → ../telemetry/debug/flightEvents
  src/sync/diskMirror.ts   — ../debug/trace → ../observability/traceContext

Added:
  src/main.ts — call markSchemaV3(deviceName) after IDB loads, before auth check
  src/telemetry/installTelemetryRuntime.ts — update _startDeviceWitnessTracker
    to use vaultSync.observeMetaChanges() instead of direct meta.observe;
    correctly handles both v2 flat entries and v3 nested Y.Map mutations;
    _witnessMetaHandler is now an unsubscribe function (not a Yjs observer)
  tests/disk-mirror-observer.ts — add observeMetaChanges stub to fake VaultSync

Verified:
  npm run build                        PASS
  npm run build:qa-product             PASS
  npm run guard:production-bundles:strict  PASS
  npm run guard:qa-isolation           PASS
  npm run test:regressions             84/84
  npm run lint:changed                 PASS
  Live Obsidian: status = 'disconnected' (not 'error'), compatibility guard
  no longer fires, SCHEMA_VERSION=3 matches server requirement
…, add schema guard

Four follow-up items from schema-v3 restoration review:

1. server/src/version.ts — restore SERVER_MIN/MAX_SCHEMA_VERSION = 3
   The P1 refactor also reverted these to 2.  The deployed server requires 3.
   Plugin is now v3. Source and deployment now agree.
   Verified live: caps={min:3,max:3}, pluginCompatibilityWarning=null,
   connectionState=online, statusSummary.state=connected.

2. installTelemetryRuntime.ts — fix tombstone witness handling
   The observeMetaChanges handler was skipping 'deleted' changes entirely.
   The v3 main.ts had explicit tombstone→markDirty('tombstone') handling.
   Fixed: deleted changes now call markDirty(path, 'tombstone') so the
   witness tracker knows to check isCrdtTombstoned() for those paths.
   Non-deleted changes retain 'remote-apply' origin (both local and remote).
   Also documented the per-Y.Text text observer gap (pre-P1 omission).

3. scripts/guard-schema-version.mjs (new) + package.json
   Prevents the P1 regression from recurring. Checks:
     - src/sync/schema.ts exists (not deleted by a future refactor)
     - vaultSync.ts imports SCHEMA_VERSION from "./schema" (not re-inlined)
     - No literal 'export const SCHEMA_VERSION = N' in vaultSync.ts
     - SCHEMA_VERSION = expected (currently 3)
     - server/src/version.ts min/max = expected
   Wired into test:regressions and npm run guard:schema-version.
   Simulated the P1 regression (inlined SCHEMA_VERSION = 2): guard FAILS.
   Clean state: guard PASSES.

4. Check 5 — Live QA scenario (s01-single-device-basic-edit)
   Ran end-to-end against the live connected vault (connectionState=online).
   analyzerPassed=true, 0 hard failures in flight trace, tracePath written
   (proves exportFlightTrace P4B fix works end-to-end).
   Assertion failure: diskEqualsCrdt hash mismatch — environmental stale CRDT
   state from prior test runs on this personal vault (server has old file
   content that overrides local create). Not a product correctness failure.

Verification:
  npm run build                        PASS
  npm run build:qa-product             PASS
  npm run guard:production-bundles:strict  PASS
  npm run guard:qa-isolation           PASS
  npm run guard:schema-version         PASS
  npm run test:regressions             84/84 (incl. guard:schema-version)
  Live: state=connected, warn=null, caps={min:3,max:3}
  Live: s01 scenario ran, trace written, analyzer 0 hard failures
## What this does

### s00 smoke scenario + run-smoke-ready.mjs controller (P4C closure)
Adds s00-smoke-trace-export.ts: a stateless required harness liveness gate
that proves window.__YAOS_DEBUG__, window.__YAOS_QA__, QA product build, and
harness plugin are all mounted in live Obsidian.

Adds run-smoke-ready.mjs: CDP controller for P4C. Checks all five pre-conditions,
runs s00 via qa.run(), asserts result.tracePath non-null, and verifies the trace
file exists on disk when QA_VAULT_PATH is set.

Wires qa:smoke-ready script (build + build:qa-product + build:harness + controller).

### s01 unique per-run paths
Replaces the static QA-scratch/s01-basic-edit.md path with a unique per-run
path (timestamp + 5-char random suffix). This eliminates static-path stale CRDT
contamination and ensures any remaining failures are real product bugs, not
test-environment pollution.

### waitForDiskCrdtConverge ordering
Restructures the s01 wait sequence to: waitForReceiptAfter (seeds the CRDT)
then waitForDiskCrdtConverge (ensures content stability). Without the receipt
wait first, waitForDiskCrdtConverge polls against a null CRDT and times out.

### Atomic typeIntoFile (was: character-by-character)
Replaces replaceRange(ch, getCursor()) loop with atomic setValue(current + text).

BEFORE: character-by-character insertion via getCursor(). In headless CDP runs
with no OS focus, getCursor() always returns {line:0,ch:0}, inserting each
character at position 0 and reversing the entire typed string.

AFTER: single atomic CodeMirror document replacement. Tests editor->CRDT->disk
propagation via one y-codemirror reconciliation pass. Does NOT test per-keystroke
behavior, debounced writes, or incremental sync. Comment in editor-ops.ts states
this explicitly.

### schema-version-guard.md
Step-by-step procedure for updating the schema guard when bumping to v4.

## Current status

P4C liveness: PROVEN live via s00 with QA_VAULT_PATH
s01 functional scenario: STILL FAILING

s01 failure is a product/harness RCA issue, not an architecture blocker:
  (1) waitForReceiptAfter takes ~18.9s despite receipt confirmed at t+0.67s
      — predicate stale after post-confirmation local update resets candidateId
  (2) QA file opened in editor at t+0.81s before scenario calls openFile
      — source unknown, triggers recovery suppression
  (3) getCrdtHash(path) disagrees with checkpoint hashMismatches=0 at t+30s
      — QA debug API and internal reconciler using different path/hash lookup

Architecture campaign goals:
  Engine/Observer/Puppeteer runtime split: DONE
  schema-v3 guard: DONE
  production bundle guards: DONE
  P4C liveness: DONE
  s01: open RCA follow-up
@kavinsood kavinsood merged commit 9b38168 into main May 30, 2026
2 checks passed
@kavinsood kavinsood deleted the p0-salvage-telemetry-split branch May 30, 2026 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant