Skip to content

refactor(api): consolidate persist path onto persistParsedSpec (fixes silent ref loss on POST /parse)#88

Merged
thewrz merged 3 commits into
mainfrom
feat/issue-53
May 20, 2026
Merged

refactor(api): consolidate persist path onto persistParsedSpec (fixes silent ref loss on POST /parse)#88
thewrz merged 3 commits into
mainfrom
feat/issue-53

Conversation

@thewrz
Copy link
Copy Markdown
Contributor

@thewrz thewrz commented May 19, 2026

Closes #53.

Summary

Issue #53 was filed expecting persistTree in src/api/parse.ts and persistParsedSpec in src/mcp/tools.ts to be verbatim duplicates that would silently desync when Phase 3 added reference insertion. Inspection on 2026-05-18 found the duplicates had already diverged — the other way: MCP/CLI paths use persistParsedSpec (with refs + upsert + replace-on-reparse); the API path still used the simpler persistTree. Workers emit refs from parser/index.ts, but the API path's Zod schema silently stripped them.

This PR consolidates onto persistParsedSpec from the API path (full inheritance). The fix is a pure refactor plus one Zod schema addition — no DB schema changes, no new migrations.

Behavior changes (intentional fixes — please review carefully)

  1. Refs persistence. Workers now emit refs (sec/txt return them from the parser; docx routes through extractRefsFromTree). persistParsedSpec inserts them into spec_references. Previously dropped silently on the API path.
  2. Upsert on (section, source). Re-POSTing the same fixture returns the same specId via INSERT … ON CONFLICT DO UPDATE instead of failing on the unique constraint.
  3. Replace-on-reparse. persistParsedSpec DELETEs old spec_references + paragraphs for the spec before reinserting. Re-parse yields a clean tree, no stale rows.

All three preserve the API response contract ({ specId, section, title, nodeCount, ... }). No HTTP behavior change. Any external caller depending on conflict failure on duplicate POST would now succeed — none known in-tree.

Files changed

  • src/api/parse.ts — delete persistTree (and its pool/createSpec/insertTree imports); extend workerOutputSchema to validate refs via SecRefSchema with .default([]); call persistParsedSpec({ tree, refs }).
  • src/ast/schemas.ts — add SecRefSchema Zod schema mirroring the SecRef interface; re-exported from src/ast/index.ts.
  • src/lib/parse-worker.ts — extend WorkerOutput with refs; sec/txt return parser refs unchanged, docx pipes through extractRefsFromTree. This had to move in lockstep with the API parse refactor, otherwise behavior change chore(ci): bump actions/setup-node from 4 to 6 #1 above wouldn't be exercised (workers were stripping refs upstream of the schema).
  • src/api/parse.test.ts — swap createSpec mock for persistParsedSpec; assert refs flow through (populated worker refs, omitted refs default to [], empty refs pass through).
  • src/api/parse.integration.test.ts — POST 01_11_00.SEC (has SRF tags) and assert spec_references rows present; re-POST same fixture and assert same specId + replaced paragraph IDs + refs still present.

createSpec is intentionally left in src/db/queries/specs.ts — still used by 4 test files and the db barrel.

Test plan

  • pnpm lint (ESLint + tsc + prettier) — green
  • pnpm test — 465 unit tests pass (including 3 new processParseJob refs persistence (#53) cases)
  • pnpm test:integration — 107 tests pass (including 2 new POST /parse — refs persistence + replace-on-reparse (#53) cases)
  • grep -n 'persistTree' src/api/parse.ts returns zero hits
  • LOC delta: 209 lines (well under 500 cap)

Out of scope

Summary by CodeRabbit

  • Tests

    • Added integration and unit tests verifying reference persistence, upsert-of-spec identity on re-upload, and replace-on-reparse behavior for paragraph content.
  • Refactor

    • Parsing now extracts and persists reference metadata alongside specs; re-parsing preserves spec identity while replacing paragraph content and retaining references.

Review Change Stack

Closes #53.

Delete persistTree from src/api/parse.ts; route POST /parse through
persistParsedSpec, the same code path already used by MCP and CLI loaders.
Behavior changes are intentional bug fixes:

1. Refs persistence: workers now emit refs (extracted from parser output);
   API path inserts them into spec_references. Previously dropped silently
   by workerOutputSchema before persistTree was called.
2. Upsert on (section, source): re-POSTing the same fixture returns the
   same specId via INSERT … ON CONFLICT DO UPDATE instead of failing on
   the unique constraint.
3. Replace-on-reparse: persistParsedSpec DELETEs old spec_references +
   paragraphs before reinserting. No stale rows after a re-parse.

Pure refactor + Zod schema addition — no DB schema changes, no migrations.
SecRefSchema added to src/ast/schemas.ts (sibling of SecRef interface).
parse-worker.ts WorkerOutput extended with refs (sec/txt/docx all emit).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4e33f08c-320a-4ff2-a976-d4e949976a1a

📥 Commits

Reviewing files that changed from the base of the PR and between 7451770 and 4758282.

📒 Files selected for processing (1)
  • src/ast/types.ts

📝 Walkthrough

Walkthrough

This PR adds a SecRef Zod schema and type, returns parsed refs from workers, validates and forwards refs through the parse job to a centralized persistParsedSpec call (removing inline persistTree), and adds unit and integration tests for refs persistence and replace-on-reparse.

Changes

Reference Extraction and Persistence Pipeline

Layer / File(s) Summary
Reference schema and types
src/ast/schemas.ts, src/ast/index.ts, src/ast/types.ts
Adds SecRefSchema (Zod), re-exports it, and makes exported SecRef a z.infer<typeof SecRefSchema> discriminated union.
Worker contract and refs extraction
src/lib/parse-worker.ts
WorkerOutput now includes refs: SecRef[]; parseWorker returns {tree, refs} for .sec and .docx, and {tree, refs, capabilities} for .txt.
Parse handler refs validation and persistence
src/api/parse.ts
Imports SecRefSchema and persistParsedSpec; extends workerOutputSchema with refs (default []); destructures refs and calls persistParsedSpec({ tree: finalTree, refs }); removes inline persistTree.
Unit tests for refs persistence behavior
src/api/parse.test.ts
Mocks updated: parsePool.run resolves {tree, refs}; persistParsedSpec mocked; beforeEach clears mocks; new tests assert forwarding/defaulting/preservation of refs to persistParsedSpec.
Integration tests for refs persistence and replace-on-reparse
src/api/parse.integration.test.ts
Adds postSecFixture() helper and tests that spec_references rows are created, re-parsing upserts the same specId, and paragraph rows are replaced (no id overlap) while refs persist.

Sequence Diagram

sequenceDiagram
  participant parseHandler
  participant processParseJob
  participant parseWorker
  participant persistParsedSpec
  parseHandler->>processParseJob: trigger async parse job
  processParseJob->>parseWorker: parsePool.run(buffer, ext)
  parseWorker-->>processParseJob: { tree, refs, ... }
  processParseJob->>persistParsedSpec: persistParsedSpec({ tree, refs })
  persistParsedSpec-->>processParseJob: specId
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • wrzonance/SpecR#14: Introduced SEC cross-reference model and initial refs extraction that this PR builds upon.
  • wrzonance/SpecR#76: Adds DOCX ref extraction (extractRefsFromTree) which this PR consumes for docx refs.
  • wrzonance/SpecR#71: Modifies worker/pool integration that this PR extends by adding refs to worker outputs.

Poem

🐰 I nibble schemas in the night,

gathered refs by lantern light,
workers hum and hand them o'er,
persisted hops across the floor,
tests applaud — the specs take flight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main refactoring: consolidating the persist path onto persistParsedSpec and fixing silent ref loss on POST /parse. It directly reflects the changeset's primary objective.
Linked Issues check ✅ Passed The PR fully addresses issue #53's objectives: consolidating persist logic onto persistParsedSpec, eliminating src/api/parse.ts:persistTree duplication, extending schema to validate refs via SecRefSchema, ensuring refs are persisted to spec_references with upsert semantics, and replacing prior rows on reparse.
Out of Scope Changes check ✅ Passed All changes are directly related to the consolidation objective: schema exports (SecRefSchema), type inference updates (SecRef), worker output extension (refs return), and API persist path refactoring. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/issue-53

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/api/parse.ts (1)

91-100: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove the as WorkerOutput cast, but keep the inline schema unchanged.

The cast violates the coding guideline requiring no type assertions across module boundaries. However, the proposed fix to use SpecTreeSchema is incorrect—the inline schema intentionally uses z.unknown() for parts and has weaker validation than SpecTreeSchema because it validates raw worker output, not final validated trees. Replacing it would change the validation semantics and likely cause runtime failures (e.g., SpecTreeSchema requires UUIDs for id and regex-validated section, which the raw output may not satisfy).

Simply remove the cast:

-    const { tree, refs, capabilities } = workerOutputSchema.parse(workerRaw) as WorkerOutput;
+    const { tree, refs, capabilities } = workerOutputSchema.parse(workerRaw);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/api/parse.ts` around lines 91 - 100, Remove the module-bound type
assertion by deleting the trailing "as WorkerOutput" cast on the
workerOutputSchema definition while keeping the inline Zod schema exactly as-is
(the object using z.unknown() for parts and the existing ParseWarningSchema and
SecRefSchema references). Do not replace the inline schema with SpecTreeSchema;
keep workerOutputSchema, ParseWarningSchema and SecRefSchema untouched and let
TypeScript infer the schema type instead of using a cross-module type assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/ast/schemas.ts`:
- Around line 34-40: Change SecRefSchema to validate sourceNodeId as a UUID (use
z.string().uuid() or z.uuid()) and enforce target-specific required fields by
making the schema a discriminated union on targetType: create two variants for
targetType === 'section' (require targetSpecSection non-empty, disallow
standardCode) and for targetType === 'standard' (require standardCode non-empty,
disallow targetSpecSection), keep referenceText in both; replace the current
z.object(...) SecRefSchema with that discriminated union so invalid combinations
are rejected.

---

Outside diff comments:
In `@src/api/parse.ts`:
- Around line 91-100: Remove the module-bound type assertion by deleting the
trailing "as WorkerOutput" cast on the workerOutputSchema definition while
keeping the inline Zod schema exactly as-is (the object using z.unknown() for
parts and the existing ParseWarningSchema and SecRefSchema references). Do not
replace the inline schema with SpecTreeSchema; keep workerOutputSchema,
ParseWarningSchema and SecRefSchema untouched and let TypeScript infer the
schema type instead of using a cross-module type assertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 7f028e66-6fda-448e-9aa6-12d661dff0ef

📥 Commits

Reviewing files that changed from the base of the PR and between 150fbbd and f6aef9e.

📒 Files selected for processing (6)
  • src/api/parse.integration.test.ts
  • src/api/parse.test.ts
  • src/api/parse.ts
  • src/ast/index.ts
  • src/ast/schemas.ts
  • src/lib/parse-worker.ts

Comment thread src/ast/schemas.ts Outdated
…NodeId as UUID

CodeRabbit flagged SecRefSchema as allowing invalid combinations
(targetType='section' without targetSpecSection, and similarly for
'standard'/standardCode). sourceNodeId was typed as z.string() despite
all node IDs being UUIDs generated by uuidv4().

Replace SecRefSchema with z.discriminatedUnion('targetType', [...])
that enforces:
 - 'section' branch: targetSpecSection required (min length 1), standardCode disallowed
 - 'standard' branch: standardCode required (min length 1), targetSpecSection disallowed
 - sourceNodeId: z.uuid() (Zod v4 strict UUID validation)

Mirror the same discriminated-union shape in the SecRef TS type so the
type and the schema can't drift apart.

Update one test fixture in parse.test.ts to use a valid v4 UUID for
sourceNodeId (was 'node-1', failed z.uuid() strict bit validation).

Existing producers in src/parser/sec/index.ts and src/parser/refs/extract.ts
already always set the correct target field per branch, so no producer
changes needed.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/ast/types.ts (1)

37-51: ⚡ Quick win

Infer SecRef from SecRefSchema to avoid contract drift.

This union now hand-duplicates the same shape already enforced in src/ast/schemas.ts. A later edit can update one side and miss the other, which is the kind of divergence this PR is removing elsewhere.

♻️ Proposed refactor
-import { NodeTypeSchema } from './schemas.js';
+import { NodeTypeSchema, SecRefSchema } from './schemas.js';
@@
-export type SecRef =
-  | {
-      readonly sourceNodeId: string;
-      readonly targetType: 'section';
-      readonly targetSpecSection: string;
-      readonly standardCode?: undefined;
-      readonly referenceText: string;
-    }
-  | {
-      readonly sourceNodeId: string;
-      readonly targetType: 'standard';
-      readonly standardCode: string;
-      readonly targetSpecSection?: undefined;
-      readonly referenceText: string;
-    };
+export type SecRef = z.infer<typeof SecRefSchema>;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ast/types.ts` around lines 37 - 51, The SecRef union duplicates the shape
defined in SecRefSchema causing potential contract drift; replace the manual
union by deriving SecRef from the Zod schema: import the SecRefSchema symbol
from src/ast/schemas.ts and export SecRef as the inferred type using
z.infer<typeof SecRefSchema> (referencing z.infer and SecRefSchema), and remove
the hand-written union so the type stays in sync with the schema.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/ast/types.ts`:
- Around line 37-51: The SecRef union duplicates the shape defined in
SecRefSchema causing potential contract drift; replace the manual union by
deriving SecRef from the Zod schema: import the SecRefSchema symbol from
src/ast/schemas.ts and export SecRef as the inferred type using z.infer<typeof
SecRefSchema> (referencing z.infer and SecRefSchema), and remove the
hand-written union so the type stays in sync with the schema.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4d381b9b-65ff-4f41-bc75-6431b518cc37

📥 Commits

Reviewing files that changed from the base of the PR and between f6aef9e and 7451770.

📒 Files selected for processing (3)
  • src/api/parse.test.ts
  • src/ast/schemas.ts
  • src/ast/types.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/api/parse.test.ts

Replaces hand-duplicated SecRef union with z.infer<typeof SecRefSchema>.
SecRefSchema is now the single source of truth — eliminates contract
drift risk between schema and type.
@thewrz
Copy link
Copy Markdown
Contributor Author

thewrz commented May 20, 2026

Follow-up on the latest CodeRabbit review-body items:

Addressed (commit 4758282): SecRef is now z.infer<typeof SecRefSchema> in src/ast/types.ts — eliminates the hand-duplicated union and the contract-drift risk CodeRabbit flagged.

Declined (with reasoning): as WorkerOutput cast at src/api/parse.ts:117 — CodeRabbit suggested removing it outright, but the inline workerOutputSchema deliberately uses z.array(z.unknown()) for tree.parts to avoid double-validation against SpecNodeSchema (worker already returns typed SpecNode[]). Without the cast, tree.parts infers as unknown[] and the downstream const finalTree: SpecTree = { ...tree, ... } assignment fails strict-mode type checking. CLAUDE.md does forbid 'type assertions across module boundaries' — so this is a genuine design tension worth a separate ticket to resolve cleanly (likely via a Zod .transform() or a narrower in-module type alias). Filing as a follow-up; not blocking this PR's persistence-consolidation goal.

@thewrz thewrz merged commit b0f3eee into main May 20, 2026
5 checks passed
@thewrz thewrz deleted the feat/issue-53 branch May 20, 2026 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor(db): extract persistSpec to db layer — eliminate mcp/parse.ts duplication

1 participant