Add pre-computed headwords with morph tokens to Entry model#2202
Draft
Add pre-computed headwords with morph tokens to Entry model#2202
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Add a computed `Headword` MultiString property to Entry that provides display-ready headwords across all vernacular writing systems, incorporating morph type tokens (leading/trailing markers like "-" for affixes). Key changes: - Entry.Headword: new MultiString property (not persisted, computed after load) - Entry.Headword() renamed to Entry.HeadwordText() to avoid conflict - EntryQueryHelpers: SQL expression and in-memory computation with morph tokens - MiniLcmRepository: populates Headword after Finalize() for CRDT path - FwDataMiniLcmApi: populates Headword using LCM IMoMorphType prefix/postfix - EntrySearchService: FTS headword field now includes all vernacular WS values, filtering/sorting uses per-WS headword expression - Frontend: prefers pre-computed entry.headword with citationForm/lexemeForm fallback - LcmCrdtKernel: Headword ignored in EF Core entity configuration https://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f
- FwDataMiniLcmApi: use `result.Headword =` assignment instead of void PopulateHeadword method - Move headword population into Finalize() overload so it's part of the standard entry finalization flow - Restore removed comments about WS ordering and #1284 - Add TODO noting HeadwordText() fallback doesn't apply morph tokens - Pre-compute headword onto FilterProjection as `let` variable to avoid redundant SQL expression evaluation in sorting/matching https://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f
- ComputeHeadwords: iterate CitationForm/LexemeForm keys instead of a passed-in vernacular WS list, so non-current WSs get headwords too - Same change for FwDataMiniLcmApi.ComputeHeadword - Remove GetVernacularWritingSystems() from MiniLcmRepository (no longer needed) - Simplify Finalize overload signature (drop vernacularWritingSystems param) - Add headword match to FilterInternal WHERE clause so entries with morph-token-decorated headwords aren't excluded by the post-filter - Add TODO to Filtering.cs for the same pattern https://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f
- Merge Entry Finalize overloads into one that always requires morphTypeDataLookup — an entry can't be finalized without it - Add commented-out HeadwordSearchValue expression and MorphTypeData JOIN patterns showing exactly how morph tokens will work in SQL once MorphTypeData becomes a CRDT entity - Update TODOs in Filtering.cs and EntrySearchService.cs to reference the new HeadwordSearchValue pattern (all WSs, not per-wsId) https://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f
Uncomment HeadwordSearchValue so the compiler validates types and the test can validate SQL translation when run locally. Nothing calls HeadwordSearchValue in production yet — it's ready for when MorphTypeData becomes a CRDT entity. HeadwordSearchValueTests covers: - Morph token string concat inside json_each subquery - CitationForm priority over LexemeForm - Non-primary WS matching (the gap identified in review) https://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f
13a4e0e to
7723717
Compare
Contributor
Contributor
|
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a pre-computed
Headwordproperty to theEntrymodel that includes morphological tokens (leading/trailing affixes) applied to lexeme forms. The headword is computed during entry loading and made available across all writing systems, enabling better search and display functionality.Key Changes
Entry Model Enhancement: Added
Headwordproperty toEntryas aMultiStringcontaining pre-computed headwords for all writing systems with morph tokens appliedLeadingToken + LexemeForm + TrailingTokenHeadword Computation:
ComputeHeadwords()method inEntryQueryHelpersfor in-memory computationHeadwordWithTokens()andHeadwordSearchValue()expression methods for SQL translationQueryHelpers.Finalize()Search Service Updates:
EntrySearchService.Filter()to acceptWritingSystemIdparameterToEntrySearchRecord()to include headwords for all vernacular writing systems (space-separated)Data Bridge Integration:
FwDataMiniLcmApi.FromLexEntry()to compute headwords using LCM morph type dataComputeHeadword()helper that mirrors the CRDT computation logicAPI Updates:
Entry.Headword()method toHeadwordText()for clarity (returns first non-empty headword for logging/display)Headwordto CRDT model configuration as ignored property (not persisted)Frontend Integration:
headwordproperty inIEntrywriting-system-serviceto prefer pre-computed headword with fallback to raw formsTesting:
HeadwordSearchValueTestscovering token matching, citation form priority, and multi-WS scenariosNotable Implementation Details
Headwordproperty is computed in-memory during entry loading and not persisted to the databaseMorphTypeDatafrom CRDT once it's implemented as a CRDT entityhttps://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f