Skip to content

Add pre-computed headwords with morph tokens to Entry model#2202

Draft
myieye wants to merge 5 commits intodevelopfrom
claude/add-lexeme-headwords-TowRX
Draft

Add pre-computed headwords with morph tokens to Entry model#2202
myieye wants to merge 5 commits intodevelopfrom
claude/add-lexeme-headwords-TowRX

Conversation

@myieye
Copy link
Collaborator

@myieye myieye commented Mar 10, 2026

Summary

This PR adds a pre-computed Headword property to the Entry model that includes morphological tokens (leading/trailing affixes) applied to lexeme forms. The headword is computed during entry loading and made available across all writing systems, enabling better search and display functionality.

Key Changes

  • Entry Model Enhancement: Added Headword property to Entry as a MultiString containing pre-computed headwords for all writing systems with morph tokens applied

    • CitationForm takes priority when present
    • Otherwise: LeadingToken + LexemeForm + TrailingToken
    • Computed for all writing systems present in CitationForm or LexemeForm
  • Headword Computation:

    • Added ComputeHeadwords() method in EntryQueryHelpers for in-memory computation
    • Added HeadwordWithTokens() and HeadwordSearchValue() expression methods for SQL translation
    • Integrated computation into entry finalization pipeline via QueryHelpers.Finalize()
  • Search Service Updates:

    • Modified EntrySearchService.Filter() to accept WritingSystemId parameter
    • Updated ToEntrySearchRecord() to include headwords for all vernacular writing systems (space-separated)
    • Improved ranking to use per-WS headword instead of generic search record headword
  • Data Bridge Integration:

    • Updated FwDataMiniLcmApi.FromLexEntry() to compute headwords using LCM morph type data
    • Added ComputeHeadword() helper that mirrors the CRDT computation logic
  • API Updates:

    • Renamed Entry.Headword() method to HeadwordText() for clarity (returns first non-empty headword for logging/display)
    • Updated all call sites throughout codebase
    • Added Headword to CRDT model configuration as ignored property (not persisted)
  • Frontend Integration:

    • Updated TypeScript types to include headword property in IEntry
    • Modified writing-system-service to prefer pre-computed headword with fallback to raw forms
    • Updated demo data and test fixtures
  • Testing:

    • Added comprehensive HeadwordSearchValueTests covering token matching, citation form priority, and multi-WS scenarios

Notable Implementation Details

  • Headword computation respects all writing systems present in the data, not just current vernacular WSs, ensuring no data loss for non-current or future writing systems
  • The Headword property is computed in-memory during entry loading and not persisted to the database
  • TODO comments indicate future work to integrate actual MorphTypeData from CRDT once it's implemented as a CRDT entity
  • Search filtering now uses per-writing-system headwords for more accurate results

https://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f

@github-actions github-actions bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Mar 10, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d7e052a7-f04d-49a9-b27b-7bb542300cbb

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/add-lexeme-headwords-TowRX

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

claude added 5 commits March 10, 2026 14:37
Add a computed `Headword` MultiString property to Entry that provides
display-ready headwords across all vernacular writing systems, incorporating
morph type tokens (leading/trailing markers like "-" for affixes).

Key changes:
- Entry.Headword: new MultiString property (not persisted, computed after load)
- Entry.Headword() renamed to Entry.HeadwordText() to avoid conflict
- EntryQueryHelpers: SQL expression and in-memory computation with morph tokens
- MiniLcmRepository: populates Headword after Finalize() for CRDT path
- FwDataMiniLcmApi: populates Headword using LCM IMoMorphType prefix/postfix
- EntrySearchService: FTS headword field now includes all vernacular WS values,
  filtering/sorting uses per-WS headword expression
- Frontend: prefers pre-computed entry.headword with citationForm/lexemeForm fallback
- LcmCrdtKernel: Headword ignored in EF Core entity configuration

https://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f
- FwDataMiniLcmApi: use `result.Headword =` assignment instead of void
  PopulateHeadword method
- Move headword population into Finalize() overload so it's part of
  the standard entry finalization flow
- Restore removed comments about WS ordering and #1284
- Add TODO noting HeadwordText() fallback doesn't apply morph tokens
- Pre-compute headword onto FilterProjection as `let` variable to avoid
  redundant SQL expression evaluation in sorting/matching

https://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f
- ComputeHeadwords: iterate CitationForm/LexemeForm keys instead of a
  passed-in vernacular WS list, so non-current WSs get headwords too
- Same change for FwDataMiniLcmApi.ComputeHeadword
- Remove GetVernacularWritingSystems() from MiniLcmRepository (no longer needed)
- Simplify Finalize overload signature (drop vernacularWritingSystems param)
- Add headword match to FilterInternal WHERE clause so entries with
  morph-token-decorated headwords aren't excluded by the post-filter
- Add TODO to Filtering.cs for the same pattern

https://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f
- Merge Entry Finalize overloads into one that always requires
  morphTypeDataLookup — an entry can't be finalized without it
- Add commented-out HeadwordSearchValue expression and MorphTypeData
  JOIN patterns showing exactly how morph tokens will work in SQL
  once MorphTypeData becomes a CRDT entity
- Update TODOs in Filtering.cs and EntrySearchService.cs to reference
  the new HeadwordSearchValue pattern (all WSs, not per-wsId)

https://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f
Uncomment HeadwordSearchValue so the compiler validates types and
the test can validate SQL translation when run locally.

Nothing calls HeadwordSearchValue in production yet — it's ready for
when MorphTypeData becomes a CRDT entity.

HeadwordSearchValueTests covers:
- Morph token string concat inside json_each subquery
- CitationForm priority over LexemeForm
- Non-primary WS matching (the gap identified in review)

https://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f
@myieye myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from 13a4e0e to 7723717 Compare March 10, 2026 14:37
@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

UI unit Tests

  1 files  ±0   54 suites  ±0   25s ⏱️ ±0s
140 tests ±0  140 ✅ ±0  0 💤 ±0  0 ❌ ±0 
207 runs  ±0  207 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 7723717. ± Comparison against base commit 3f62d45.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

C# Unit Tests

162 tests  ±0   162 ✅ ±0   19s ⏱️ -1s
 23 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 7723717. ± Comparison against base commit 3f62d45.

@argos-ci
Copy link

argos-ci bot commented Mar 10, 2026

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ✅ No changes detected - Mar 10, 2026, 2:41 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants