Skip to content

Implement homograph numbers and improve entry search/sort behavior#2220

Open
myieye wants to merge 8 commits intofeat/sync-morph-typesfrom
claude/add-homograph-numbers-V1a05
Open

Implement homograph numbers and improve entry search/sort behavior#2220
myieye wants to merge 8 commits intofeat/sync-morph-typesfrom
claude/add-homograph-numbers-V1a05

Conversation

@myieye
Copy link
Copy Markdown
Collaborator

@myieye myieye commented Mar 24, 2026

Summary

This PR implements homograph number auto-assignment for entries with duplicate headwords and significantly improves entry search and sorting behavior to properly handle morphological tokens and secondary ordering.

Key Changes

Homograph Number Management

  • Added HomographNumber property to Entry model to distinguish entries with identical headwords
  • Implemented auto-assignment logic in CrdtMiniLcmApi.AssignHomographNumber() that:
    • Groups entries by headword and MorphType.SecondaryOrder
    • Auto-assigns sequential numbers (1, 2, 3...) when creating new entries
    • Promotes existing lone entries from 0 to 1 when a duplicate is created
    • Respects explicitly set homograph numbers

Search and Filtering Improvements

  • Morph Token Handling: Search now properly strips morph tokens (prefixes/suffixes) from lexeme forms before matching, while preserving them in headword display
  • Citation Form Override: Citation forms now correctly override lexeme forms with morph tokens in search results
  • Lexeme Form Matching: Fixed search to match against lexeme forms (not just headwords), enabling discovery of entries where only the lexeme matches the query
  • Secondary Order Integration: Sorting now incorporates MorphType.SecondaryOrder to group related morphological variants together

Sorting Enhancements

  • Added ApplyHeadwordOrder() method to apply consistent headword-based sorting with secondary order consideration
  • Updated ApplyRoughBestMatchOrder() to include secondary order and homograph number in sort criteria
  • Search relevance ranking now properly considers:
    • Headword matches vs. other field matches
    • Prefix matches vs. contains matches
    • Match length (shorter is better)
    • Secondary order (for grouping morphological variants)
    • Homograph number (for stable ordering of duplicates)

Data Model Changes

  • Added Headword computed property to Entry (pre-computed by backend, populated during finalization)
  • Added helper methods: HeadwordWithTokens(), SearchHeadwords(), ComputeHeadwords() in EntryQueryHelpers
  • Updated EntrySearchService.FilterAndRank() to use new ranking logic with secondary order support
  • Modified Filtering.cs to accept IQueryable<MorphType> for proper secondary order filtering

Test Coverage

  • Added comprehensive tests for homograph number auto-assignment
  • Added tests verifying morph tokens don't affect sort order
  • Added tests for search relevance with secondary order (both FTS and non-FTS)
  • Added tests for citation form override behavior
  • Added tests for morph token search functionality

API Updates

  • Updated EntrySearchService.Filter() to accept WritingSystemId parameter
  • Renamed Headword() method to HeadwordText() in various test helpers to distinguish from the new Headword property
  • Updated FwData bridge to compute headwords during entry conversion

Notable Implementation Details

  • Homograph number assignment uses a single DB query pattern consistent with existing sorting logic
  • Morph token stripping is applied only to lexeme form searches, not citation form searches
  • Secondary order defaults to Stem's order when an entry's MorphType is not found
  • The Headword property is computed by the backend and excluded from strict equality checks in sync tests

https://claude.ai/code/session_01FJj2v135u6KdgVxoK4tRp2

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 03bd3f16-05ca-48b1-97a5-caee5bcac251

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR implements homograph number support to track and distinguish multiple entries sharing the same headword. It adds the HomographNumber property to Entry models, implements auto-assignment logic in CRDT, updates database schema, modifies sorting/search ordering, and includes comprehensive tests across both CRDT and FwData layers.

Changes

Cohort / File(s) Summary
API Property Mapping
backend/FwLite/FwDataMiniLcmBridge/Api/FwDataMiniLcmApi.cs, backend/FwLite/FwDataMiniLcmBridge/Api/UpdateProxy/UpdateEntryProxy.cs, backend/FwLite/MiniLcm/Models/Entry.cs, frontend/viewer/src/lib/dotnet-types/generated-types/MiniLcm/Models/IEntry.ts
Added HomographNumber property to Entry models and proxies; mapped from LibLCM entries and exposed in frontend types; conditionally assigns when importing from FwData.
Homograph Auto-Assignment Logic
backend/FwLite/LcmCrdt/CrdtMiniLcmApi.cs
Added AssignHomographNumber helper to auto-assign homograph numbers during entry creation; derives secondary ordering from morph type, queries existing entries, patches first zero entry to 1, assigns incrementing numbers to new entries.
Sorting & Full-Text Search
backend/FwLite/FwDataMiniLcmBridge/Api/Sorting.cs, backend/FwLite/LcmCrdt/Data/Sorting.cs, backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs
Enabled HomographNumber as deterministic tie-breaker in headword ordering and best-match ranking across all three sorting implementations; re-enabled previously commented-out homograph tie-breaks.
Change Type & Sync
backend/FwLite/LcmCrdt/Changes/CreateEntryChange.cs, backend/FwLite/MiniLcm/SyncHelpers/EntrySync.cs
Added HomographNumber property to CreateEntryChange for persistence; added JSON patch operation generation when homograph numbers differ during sync.
Database Schema & Migrations
backend/FwLite/LcmCrdt/Migrations/20260409130907_AddHomographNumbers.*, backend/FwLite/LcmCrdt/Migrations/LcmCrdtDbContextModelSnapshot.cs
Added EF Core migration to create HomographNumber INTEGER column with default 0 on Entry table; updated model snapshot with new property configuration.
Test Data Verification
backend/FwLite/LcmCrdt.Tests/Changes/ChangeDeserializationRegressionData.*, backend/FwLite/LcmCrdt.Tests/Data/MigrationTests_FromScriptedDb.*.verified.*, backend/FwLite/LcmCrdt.Tests/Data/SnapshotDeserializationRegressionData.*, backend/FwLite/LcmCrdt.Tests/Data/VerifyRegeneratedSnapshotsAfterMigrationFromScriptedDb.*, backend/FwLite/LcmCrdt.Tests/DataModelSnapshotTests.VerifyDbModel.verified.txt
Updated all verified snapshot and test data files to include HomographNumber field (typically 0 for existing entries); added new entry fixture with homograph number 4.
Unit & Integration Tests
backend/FwLite/MiniLcm.Tests/CreateEntryTestsBase.cs, backend/FwLite/MiniLcm.Tests/SortingTestsBase.cs, backend/FwLite/FwLiteProjectSync.Tests/SyncTests.cs
Added five new homograph auto-assignment tests validating incrementation, explicit values, morph-type grouping, citation-form grouping, and multi-entry scenarios; added integration test verifying homograph correction through two-sync cycles; updated sorting test fixtures with explicit homograph variants.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

📦 Lexbox

Suggested reviewers

  • rmunn

Poem

🐰 A homograph's number now takes its place,
Distinguishing entries with matching face,
Through sorting and sync they journey along,
Each one assigned where it belongs!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.21% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the main changes: implementing homograph numbers and improving entry search/sort behavior, which align with the core objectives.
Description check ✅ Passed The PR description comprehensively outlines the key changes, implementation details, and test coverage related to homograph number management and search/sorting improvements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/add-homograph-numbers-V1a05

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Mar 24, 2026
@myieye myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from 4246a62 to 2e82fa6 Compare March 27, 2026 16:00
Base automatically changed from claude/add-lexeme-headwords-TowRX to feat/sync-morph-types March 28, 2026 09:14
@myieye myieye force-pushed the feat/sync-morph-types branch 3 times, most recently from e3ec8c7 to 87c5dad Compare April 9, 2026 12:24
@myieye myieye force-pushed the claude/add-homograph-numbers-V1a05 branch from 8d2c6c6 to b95da37 Compare April 9, 2026 12:35
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

UI unit Tests

  1 files  ±0   59 suites  ±0   27s ⏱️ -9s
176 tests ±0  176 ✅ ±0  0 💤 ±0  0 ❌ ±0 
245 runs  ±0  245 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 77a861f. ± Comparison against base commit 8ed192f.

♻️ This comment has been updated with latest results.

@argos-ci
Copy link
Copy Markdown

argos-ci bot commented Apr 9, 2026

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ⚠️ Changes detected (Review) 6 changed Apr 10, 2026, 10:37 AM

* Seed canonical morph-types into CRDT projects

- Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM)
- Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes
- Seed morph-types for existing projects in MigrateDb (before FTS refresh)
- Add EF migration to clear FTS table so headwords are rebuilt with morph tokens
- Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates

* Stop creating morph-types in tests. They're now prepopulated

* Stop printing verify diff content. It's too much.

* Seed morph types before API testing

* Add descriptions to canonical morph types

* Sync morph-types when importing, because they already exist in CRDT

* Verify our canonical morph-types match new fwdata projects

* Fix non-FTS relevance order with morph-tokens in query
@myieye myieye force-pushed the feat/sync-morph-types branch from 87c5dad to 8ed192f Compare April 9, 2026 15:24
claude and others added 5 commits April 9, 2026 17:25
Add HomographNumber (int, 0 = unset) to the Entry model with full
round-trip support through CRDT, FwData bridge, and sync.

Key changes:
- Entry model: add HomographNumber property with Copy() support
- CreateEntryChange: persist HomographNumber in CRDT changes
- CrdtMiniLcmApi: auto-assign homograph numbers on entry creation
  when HomographNumber is 0, respecting SecondaryOrder scoping.
  Updates existing lone entries from 0→1 when a second homograph appears.
- FwDataMiniLcmApi: read HomographNumber from ILexEntry, set on create
- UpdateEntryProxy: bidirectional HomographNumber sync to LibLCM
- EntrySync: include HomographNumber in diff/patch operations
- Sorting: uncomment HomographNumber in CRDT sort and search queries
- Tests: uncomment sorting tests with HomographNumber, add auto-
  assignment tests, add sync test verifying LibLCM corrects numbers
  after entry deletion via two sync cycles

https://claude.ai/code/session_01FJj2v135u6KdgVxoK4tRp2
@myieye myieye force-pushed the claude/add-homograph-numbers-V1a05 branch from aaf00bb to 0db4516 Compare April 9, 2026 15:42
@myieye myieye marked this pull request as ready for review April 9, 2026 15:46
@myieye
Copy link
Copy Markdown
Collaborator Author

myieye commented Apr 9, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

✅ Actions performed

Full review triggered.

@myieye myieye force-pushed the claude/add-homograph-numbers-V1a05 branch from 4be17e9 to 77a861f Compare April 10, 2026 10:34
@myieye myieye force-pushed the feat/sync-morph-types branch from 8ed192f to 374b1b2 Compare April 10, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants