Skip to content

feat(evaluation): layout-aware FormSpec generation and quality evaluation#134

Open
danielnaab wants to merge 15 commits intomainfrom
story-121/form-layout
Open

feat(evaluation): layout-aware FormSpec generation and quality evaluation#134
danielnaab wants to merge 15 commits intomainfrom
story-121/form-layout

Conversation

@danielnaab
Copy link
Copy Markdown
Member

@danielnaab danielnaab commented May 6, 2026

Summary

  • Adds a layout-aware FormSpec generator (generateFormSpecWithLayout) that breaks forms into logical multi-page sections with adaptive sizing, topic-cohesive grouping, and conditional-page guidance
  • Registers sonnet-hybrid-layout-v1 extraction variant that plugs in the layout generator via a new formSpecGenerator extension point on createBedrockPdfExtractor
  • Adds a layout-quality evaluation kind with a Bedrock LLM judge, a evaluate layout CLI subcommand, and experiment results for 4 government PDFs

Story

Closes #121

Acceptance Criteria

  • An audit of current form output identifies specific issues with examples (catalog/experiments/layout-quality/findings.md)
  • Forms with more than ~10 fields are broken into logical sections or multi-step flows (enforced by generateFormSpecWithLayout prompt)
  • Related fields are grouped with descriptive section headings (layout prompt enforces topic-cohesive grouping)
  • The form layout works well on mobile viewports (requires UI-layer changes; out of scope for generation pipeline)
  • Accessibility review confirms screen reader compatibility (out of scope for this branch)
  • At least one real-world form tested end-to-end (4 government PDFs evaluated; results in catalog/experiments/layout-quality/)

Test Plan

  • bun run check passes (1439 tests, type check clean; pre-existing lint warnings on main unchanged)
  • test/form-documents/layout-prompt.test.ts — layout prompt content, statistics, schema
  • test/evaluation/layout-quality.test.ts — score normalization, NaN protection via Zod schema, summarize
  • Architecture dependency rule tests pass

Review Notes

Extension point design: formSpecGenerator on BedrockExtractorOptions accepts (model, spec, activityStore?, userId?, projectId?, modelId?) => Promise<FormSpec>, preserving cost tracking regardless of which generator is active. Passing generateFormSpecWithLayout directly in the registry is compatible with this signature.

Evaluation infrastructure: The layout-quality kind follows the factory pattern from createLlmJudgeKind — judge injected at construction, not via mutable global state.

Scope discipline: The sonnet-hybrid-layout-v1 variant is registered as experimental. Promotion to production default is deferred pending #132 (deterministic conditional page injection), which addresses the main gap identified in the findings.

@danielnaab danielnaab temporarily deployed to story-121-form-layout May 6, 2026 19:36 Inactive
danielnaab added 13 commits May 7, 2026 05:25
The formSpecSchema requires these fields. Also commits baseline and
layout variant evaluation results showing +17.7% overall improvement.
Documents methodology, per-fixture results, and recommendations.
Key finding: +17.7pp overall improvement with largest gains in title
clarity (+43.7pp), topic cohesion (+37.5pp), and page sizing (+31.3pp).
Conditional page use and delivery mode identified as areas for iteration.
…ance

Delivery mode: removed overly conservative "default to static" and
replaced with content-complexity-based criteria (narrative fields,
sensitive topics, eligibility logic → conversational).

Conditional pages: added explicit instructions for deriving page-level
conditions from field-level conditions, with a worked example in the
schema. Modest improvement (+6.3pp) but the inference remains hard for
a prompt-only approach.

Results: overall 77.1% (+19.8pp vs baseline). Delivery mode regression
eliminated. Conditional use improved from 37.5% to 43.8%.
Final results after prompt iteration: +19.8pp overall (57.3% → 77.1%).
Delivery mode regression eliminated. Conditional page use improved
modestly (+6.3pp) but confirmed as a prompt-difficulty ceiling.
Follow-up filed as #132 for deterministic post-processing approach.
Replace module-level mutable state (setLayoutJudge) with a factory
function (createLayoutQualityKind) that takes the judge as a parameter.
Consistent with the existing createLlmJudgeKind pattern.
- Add Zod schema validation for layout judge response (prevents NaN from
  malformed model output)
- Add activity tracking to generateFormSpecWithLayout (matches generateFormSpec)
- Update formSpecGenerator type to accept activity-tracking params
- Add evaluationRunSchema.parse() before writing layout evaluation results
- Fix score() signature to include _groundTruth parameter per EvaluationKind interface
- Remove erroneous groundTruth filter in layout evaluation subcommand
- Export buildLayoutPrompt from form-documents public index
- Fix test import to use public index instead of internal path
- Remove buildLayoutJudgePrompt from evaluation public index (implementation detail)
@danielnaab danielnaab force-pushed the story-121/form-layout branch from 0e4af86 to b5685ad Compare May 7, 2026 05:36
@danielnaab danielnaab temporarily deployed to story-121-form-layout May 7, 2026 05:36 Inactive
@danielnaab danielnaab changed the title feat(form-documents): layout-aware FormSpec generation feat(evaluation): layout-aware FormSpec generation and quality evaluation May 7, 2026
@danielnaab danielnaab temporarily deployed to story-121-form-layout May 7, 2026 05:37 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize generated form layout to follow best practices

1 participant