refactor(api): consolidate persist path onto persistParsedSpec (fixes silent ref loss on POST /parse)#88
Conversation
Closes #53. Delete persistTree from src/api/parse.ts; route POST /parse through persistParsedSpec, the same code path already used by MCP and CLI loaders. Behavior changes are intentional bug fixes: 1. Refs persistence: workers now emit refs (extracted from parser output); API path inserts them into spec_references. Previously dropped silently by workerOutputSchema before persistTree was called. 2. Upsert on (section, source): re-POSTing the same fixture returns the same specId via INSERT … ON CONFLICT DO UPDATE instead of failing on the unique constraint. 3. Replace-on-reparse: persistParsedSpec DELETEs old spec_references + paragraphs before reinserting. No stale rows after a re-parse. Pure refactor + Zod schema addition — no DB schema changes, no migrations. SecRefSchema added to src/ast/schemas.ts (sibling of SecRef interface). parse-worker.ts WorkerOutput extended with refs (sec/txt/docx all emit).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds a SecRef Zod schema and type, returns parsed refs from workers, validates and forwards refs through the parse job to a centralized persistParsedSpec call (removing inline persistTree), and adds unit and integration tests for refs persistence and replace-on-reparse. ChangesReference Extraction and Persistence Pipeline
Sequence DiagramsequenceDiagram
participant parseHandler
participant processParseJob
participant parseWorker
participant persistParsedSpec
parseHandler->>processParseJob: trigger async parse job
processParseJob->>parseWorker: parsePool.run(buffer, ext)
parseWorker-->>processParseJob: { tree, refs, ... }
processParseJob->>persistParsedSpec: persistParsedSpec({ tree, refs })
persistParsedSpec-->>processParseJob: specId
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/api/parse.ts (1)
91-100:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove the
as WorkerOutputcast, but keep the inline schema unchanged.The cast violates the coding guideline requiring no type assertions across module boundaries. However, the proposed fix to use
SpecTreeSchemais incorrect—the inline schema intentionally usesz.unknown()for parts and has weaker validation thanSpecTreeSchemabecause it validates raw worker output, not final validated trees. Replacing it would change the validation semantics and likely cause runtime failures (e.g.,SpecTreeSchemarequires UUIDs foridand regex-validatedsection, which the raw output may not satisfy).Simply remove the cast:
- const { tree, refs, capabilities } = workerOutputSchema.parse(workerRaw) as WorkerOutput; + const { tree, refs, capabilities } = workerOutputSchema.parse(workerRaw);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/parse.ts` around lines 91 - 100, Remove the module-bound type assertion by deleting the trailing "as WorkerOutput" cast on the workerOutputSchema definition while keeping the inline Zod schema exactly as-is (the object using z.unknown() for parts and the existing ParseWarningSchema and SecRefSchema references). Do not replace the inline schema with SpecTreeSchema; keep workerOutputSchema, ParseWarningSchema and SecRefSchema untouched and let TypeScript infer the schema type instead of using a cross-module type assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/ast/schemas.ts`:
- Around line 34-40: Change SecRefSchema to validate sourceNodeId as a UUID (use
z.string().uuid() or z.uuid()) and enforce target-specific required fields by
making the schema a discriminated union on targetType: create two variants for
targetType === 'section' (require targetSpecSection non-empty, disallow
standardCode) and for targetType === 'standard' (require standardCode non-empty,
disallow targetSpecSection), keep referenceText in both; replace the current
z.object(...) SecRefSchema with that discriminated union so invalid combinations
are rejected.
---
Outside diff comments:
In `@src/api/parse.ts`:
- Around line 91-100: Remove the module-bound type assertion by deleting the
trailing "as WorkerOutput" cast on the workerOutputSchema definition while
keeping the inline Zod schema exactly as-is (the object using z.unknown() for
parts and the existing ParseWarningSchema and SecRefSchema references). Do not
replace the inline schema with SpecTreeSchema; keep workerOutputSchema,
ParseWarningSchema and SecRefSchema untouched and let TypeScript infer the
schema type instead of using a cross-module type assertion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 7f028e66-6fda-448e-9aa6-12d661dff0ef
📒 Files selected for processing (6)
src/api/parse.integration.test.tssrc/api/parse.test.tssrc/api/parse.tssrc/ast/index.tssrc/ast/schemas.tssrc/lib/parse-worker.ts
…NodeId as UUID
CodeRabbit flagged SecRefSchema as allowing invalid combinations
(targetType='section' without targetSpecSection, and similarly for
'standard'/standardCode). sourceNodeId was typed as z.string() despite
all node IDs being UUIDs generated by uuidv4().
Replace SecRefSchema with z.discriminatedUnion('targetType', [...])
that enforces:
- 'section' branch: targetSpecSection required (min length 1), standardCode disallowed
- 'standard' branch: standardCode required (min length 1), targetSpecSection disallowed
- sourceNodeId: z.uuid() (Zod v4 strict UUID validation)
Mirror the same discriminated-union shape in the SecRef TS type so the
type and the schema can't drift apart.
Update one test fixture in parse.test.ts to use a valid v4 UUID for
sourceNodeId (was 'node-1', failed z.uuid() strict bit validation).
Existing producers in src/parser/sec/index.ts and src/parser/refs/extract.ts
already always set the correct target field per branch, so no producer
changes needed.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/ast/types.ts (1)
37-51: ⚡ Quick winInfer
SecReffromSecRefSchemato avoid contract drift.This union now hand-duplicates the same shape already enforced in
src/ast/schemas.ts. A later edit can update one side and miss the other, which is the kind of divergence this PR is removing elsewhere.♻️ Proposed refactor
-import { NodeTypeSchema } from './schemas.js'; +import { NodeTypeSchema, SecRefSchema } from './schemas.js'; @@ -export type SecRef = - | { - readonly sourceNodeId: string; - readonly targetType: 'section'; - readonly targetSpecSection: string; - readonly standardCode?: undefined; - readonly referenceText: string; - } - | { - readonly sourceNodeId: string; - readonly targetType: 'standard'; - readonly standardCode: string; - readonly targetSpecSection?: undefined; - readonly referenceText: string; - }; +export type SecRef = z.infer<typeof SecRefSchema>;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/ast/types.ts` around lines 37 - 51, The SecRef union duplicates the shape defined in SecRefSchema causing potential contract drift; replace the manual union by deriving SecRef from the Zod schema: import the SecRefSchema symbol from src/ast/schemas.ts and export SecRef as the inferred type using z.infer<typeof SecRefSchema> (referencing z.infer and SecRefSchema), and remove the hand-written union so the type stays in sync with the schema.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/ast/types.ts`:
- Around line 37-51: The SecRef union duplicates the shape defined in
SecRefSchema causing potential contract drift; replace the manual union by
deriving SecRef from the Zod schema: import the SecRefSchema symbol from
src/ast/schemas.ts and export SecRef as the inferred type using z.infer<typeof
SecRefSchema> (referencing z.infer and SecRefSchema), and remove the
hand-written union so the type stays in sync with the schema.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 4d381b9b-65ff-4f41-bc75-6431b518cc37
📒 Files selected for processing (3)
src/api/parse.test.tssrc/ast/schemas.tssrc/ast/types.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/api/parse.test.ts
Replaces hand-duplicated SecRef union with z.infer<typeof SecRefSchema>. SecRefSchema is now the single source of truth — eliminates contract drift risk between schema and type.
|
Follow-up on the latest CodeRabbit review-body items: Addressed (commit 4758282): Declined (with reasoning): |
Closes #53.
Summary
Issue #53 was filed expecting
persistTreeinsrc/api/parse.tsandpersistParsedSpecinsrc/mcp/tools.tsto be verbatim duplicates that would silently desync when Phase 3 added reference insertion. Inspection on 2026-05-18 found the duplicates had already diverged — the other way: MCP/CLI paths usepersistParsedSpec(with refs + upsert + replace-on-reparse); the API path still used the simplerpersistTree. Workers emitrefsfromparser/index.ts, but the API path's Zod schema silently stripped them.This PR consolidates onto
persistParsedSpecfrom the API path (full inheritance). The fix is a pure refactor plus one Zod schema addition — no DB schema changes, no new migrations.Behavior changes (intentional fixes — please review carefully)
refs(sec/txt return them from the parser; docx routes throughextractRefsFromTree).persistParsedSpecinserts them intospec_references. Previously dropped silently on the API path.(section, source). Re-POSTing the same fixture returns the samespecIdviaINSERT … ON CONFLICT DO UPDATEinstead of failing on the unique constraint.persistParsedSpecDELETEs oldspec_references+paragraphsfor the spec before reinserting. Re-parse yields a clean tree, no stale rows.All three preserve the API response contract (
{ specId, section, title, nodeCount, ... }). No HTTP behavior change. Any external caller depending on conflict failure on duplicate POST would now succeed — none known in-tree.Files changed
src/api/parse.ts— deletepersistTree(and itspool/createSpec/insertTreeimports); extendworkerOutputSchemato validaterefsviaSecRefSchemawith.default([]); callpersistParsedSpec({ tree, refs }).src/ast/schemas.ts— addSecRefSchemaZod schema mirroring theSecRefinterface; re-exported fromsrc/ast/index.ts.src/lib/parse-worker.ts— extendWorkerOutputwithrefs; sec/txt return parser refs unchanged, docx pipes throughextractRefsFromTree. This had to move in lockstep with the API parse refactor, otherwise behavior change chore(ci): bump actions/setup-node from 4 to 6 #1 above wouldn't be exercised (workers were stripping refs upstream of the schema).src/api/parse.test.ts— swapcreateSpecmock forpersistParsedSpec; assert refs flow through (populated worker refs, omitted refs default to[], empty refs pass through).src/api/parse.integration.test.ts— POST01_11_00.SEC(has SRF tags) and assertspec_referencesrows present; re-POST same fixture and assert samespecId+ replaced paragraph IDs + refs still present.createSpecis intentionally left insrc/db/queries/specs.ts— still used by 4 test files and the db barrel.Test plan
pnpm lint(ESLint + tsc + prettier) — greenpnpm test— 465 unit tests pass (including 3 newprocessParseJob refs persistence (#53)cases)pnpm test:integration— 107 tests pass (including 2 newPOST /parse — refs persistence + replace-on-reparse (#53)cases)grep -n 'persistTree' src/api/parse.tsreturns zero hitsOut of scope
createSpec(still used by tests + barrel re-export)Summary by CodeRabbit
Tests
Refactor