Seed canonical morph types and regenerate search index by myieye · Pull Request #2219 · sillsdev/languageforge-lexbox

myieye · 2026-03-24T14:20:40Z

Resolves #2212

Summary

This PR ensures all canonical morph type definitions are seeded into new and existing CRDT projects, and regenerates the full-text search index to include morph-type prefix/postfix tokens in searchable headwords.

Key Changes

Added canonical morph type definitions (CanonicalMorphTypes.cs): Defines 20 standard morph types matching FieldWorks/LibLCM specifications with verified GUIDs, names, abbreviations, and affixes. Uses frozen dictionary for immutability and performance.
Automatic seeding on project creation and migration: Modified CrdtProjectsService and CurrentProjectService to seed canonical morph types when:
- Creating a new project with SeedNewProjectData: true
- Opening an existing project that lacks morph types (via MigrateDb)
Search index regeneration: Added migration 20260318120000_RegenerateSearchTableForMorphTypes that clears the EntrySearchRecord table to force FTS rebuild with updated morph-type affixes.
Legacy snapshot patching: Updated CrdtFwdataProjectSyncService to patch legacy snapshots missing morph types during sync operations, preventing duplicates when syncing with FwData.
Comprehensive test coverage: Added MorphTypeSeedingTests verifying:
- New projects receive all canonical morph types
- Existing projects without morph types get seeded on open
- Seeding is idempotent (no duplicates on repeated opens)
- All MorphTypeKind enum values are covered
Added integration test in Sena3SyncTests validating canonical definitions match FwData morph types.
Regression test versioning: Updated RegressionTestHelper to version 3 for snapshot compatibility.

Implementation Details

Morph types are seeded with fixed GUIDs matching SIL.LCModel constants, ensuring consistency across projects and with FieldWorks data.
Seeding uses PreDefinedData.PredefinedMorphTypes() to integrate with existing predefined data infrastructure.
Search index regeneration is safe: FTS table is lazily rebuilt on next query if cleared.
Sync operations detect and skip seeding if morph types already exist, preventing duplicates in legacy workflows.

https://claude.ai/code/session_01WDKE2vXP4gjMWjfn4cmL4p

coderabbitai · 2026-03-24T14:20:48Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 16bf8ea9-2885-4f4d-9f88-3a4759c06106

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds a canonical registry of morph types (CanonicalMorphTypes), integrates seeding of canonical morph types during project creation and migration, patches legacy snapshots with missing morph-type data from the CRDT, and rebuilds full-text search indexes. Tests are updated to use canonical morph types instead of creating inline instances.

Changes

Cohort / File(s)	Summary
Canonical Morph Types Definition `backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs`	Added new public class defining a frozen dictionary of canonical `MorphType` instances keyed by `MorphTypeKind`, with fixed GUIDs, localized names/abbreviations/descriptions, and affix-specific metadata (prefixes, postfixes, secondary order).
Seeding Integration `backend/FwLite/LcmCrdt/Objects/PreDefinedData.cs`, `backend/FwLite/LcmCrdt/CrdtProjectsService.cs`, `backend/FwLite/LcmCrdt/CurrentProjectService.cs`, `backend/FwLite/LcmCrdt.Tests/MiniLcmApiFixture.cs`	Added `PredefinedMorphTypes()` method to seed canonical morph types into `DataModel`; integrated seeding into project creation (`CrdtProjectsService`), project migration (`CurrentProjectService`), and test fixtures.
Legacy Snapshot Patching `backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs`	Added conditional logic to populate empty `MorphTypes` in legacy snapshots from current CRDT morph types before sync, ensuring pre-morph-type projects are upgraded.
Database Migration `backend/FwLite/LcmCrdt/Migrations/20260318120000_RegenerateSearchTableForMorphTypes.cs`, `backend/FwLite/LcmCrdt/Migrations/20260318120000_RegenerateSearchTableForMorphTypes.Designer.cs`	Added EF Core migration to clear `EntrySearchRecord` table, forcing full-text search rebuild to incorporate morph-type prefix/postfix tokens; designer file documents target model schema.
Fluent Assert Configuration `backend/FwLite/FwLiteProjectSync.Tests/FluentAssertGlobalConfig.cs`, `backend/FwLite/LcmCrdt.Tests/Data/MigrationTests.cs`	Added module initializer and test configuration to omit exception content during verification assertions, improving test readability.
Morph Type Seeding Test Suite `backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs`	Added comprehensive test class validating canonical morph-type seeding during project creation, migration, idempotency, and integrity (unique IDs, required fields, kind coverage).
Integration & Sync Tests `backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs`, `backend/FwLite/FwLiteProjectSync.Tests/SyncTests.cs`, `backend/FwLite/FwLiteProjectSync.Tests/Import/ResumableTests.cs`	Added tests for canonical morph-type validation against FwData, legacy snapshot sync behavior with empty morph types, and updated existing tests to reference canonical morph types instead of locally constructed instances.
Test Data Updates `backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs`, `backend/FwLite/MiniLcm.Tests/SortingTestsBase.cs`, `backend/FwLite/FwDataMiniLcmBridge.Tests/MiniLcmTests/SortingTests.cs`, `backend/FwLite/LcmCrdt.Tests/MiniLcmTests/SortingTests.cs`, `backend/FwLite/LcmCrdt.Tests/Data/FilteringTests.cs`	Removed inline `MorphType` creation in tests and updated expected `SecondaryOrder` values to match canonical definitions (e.g., `Unknown: 0`, `BoundStem: 10`, `Suffix: 70`).
Test Infrastructure `backend/FwLite/LcmCrdt.Tests/OpenProjectTests.cs`, `backend/FwLite/LcmCrdt.Tests/Data/RegressionTestHelper.cs`	Minor test file naming and trailing-comma formatting updates; no behavioral changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Add canonical morph-types and refresh headwords #2212: This PR directly addresses the issue by implementing canonical morph-type seeding, legacy snapshot patching, and full-text search regeneration as required.

Possibly related PRs

Handle morph-type tokens: filtering / persistence #2202: Implements overlapping morph-type and token handling surfaces, including model refinements and test pattern updates.
Display morph type tokens in UI #2205: Exposes GetMorphTypes to the UI layer and adds frontend morph-type services to consume the canonical morph-type data seeded here.
Test snapshots & take them from CRDT projects during merge #1913: Also modifies CrdtFwdataProjectSyncService snapshot handling; coordinates with this PR's legacy snapshot patching logic.

Suggested labels

📦 Lexbox

Suggested reviewers

rmunn

Poem

🐰 Whiskers twitching with delight,
Canonical types now seeded just right!
From CRDT to search, they hop and they play,
Legacy snapshots—no dupes today! 🌱✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title 'Seed canonical morph types and regenerate search index' directly and accurately summarizes the main changes: seeding morph type definitions and regenerating the search index.
Description check	✅ Passed	The pull request description comprehensively explains the changes, key modifications, implementation details, and test coverage related to seeding morph types and search index regeneration.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/fix-morph-type-syncing-bQo8X

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-24T15:43:38Z

UI unit Tests

1 files ±0 56 suites ±0 28s ⏱️ ±0s
166 tests ±0 166 ✅ ±0 0 💤 ±0 0 ❌ ±0
233 runs ±0 233 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit 8a43b19. ± Comparison against base commit 6af9827.

♻️ This comment has been updated with latest results.

argos-ci · 2026-03-24T15:44:30Z

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build	Status	Details	Updated (UTC)
default (Inspect)	⚠️ Changes detected (Review)	6 changed	Apr 8, 2026, 3:59 PM

rmunn · 2026-03-31T05:59:06Z

Now working on getting tests to pass. Writing down my findings as I go, so this is going to be a little bit stream-of-consciousness.

Sorting order tests are acting non-deterministic, with different sort orders returning on different runs of the same test. I.e., one run of the test might produce the sort order [suffix, root, prefix] and the next run of the same test might produce [suffix, prefix, root].

I'm starting to suspect homograph numbers may be to blame. The MorphTokens_DoNotAffectSortOrder test creates three entries, "aaaa" (a root), "-aaaa" (a suffix), and "aaaa-" (a prefix). Those should not be treated as homographs... but if they are being given homograph numbers, then the order of the homograph numbers may entirely depend on the order in which the homographs are found.

... And I just noticed that the sorting tests are ending up sorting those three words in an order that matches the alphabetical ordering of their GUID. Which would be consistent with the theory that the homograph number is what's being sorted on here.

... But a good argument against that is that the tests that are failing are the LcmCrdt sorting tests, where the "sort by HomographNumber" lines are still commented out! Might have to search for a new theory.

rmunn · 2026-03-31T08:09:11Z

It looks like the predefined morph types aren't being seeded in the sorting tests, and that may be the cause of the failures.

rmunn · 2026-03-31T08:37:30Z

The issue for the sort tests was that the morph types, and other prepopulated data, are populated in CrdtProjectsService.CreateProject, which calls SeedSystemData. But there's a comment in MiniLcmApiFixture explaining that it doesn't call CrdtProjectsService.CreateProject because that ends up opening and closing the DB context (due to the using var db = ... statement in CreateProject), which would end up erasing/resetting the in-memory DB. Instead, MiniLcmApiFixture reproduces the steps that CreateProject goes through, except for seeding initial data such as writing systems.

I added a step to MiniLcmApiFixture to seed morph types (but not other initial data like semantic domains or parts of speech), and now the tests appear to be passing because the sort order logic can actually look up the morph types and get their SecondaryOrder.

The earlier test failures were because the morph types weren't present, so the secondary order was always 0 (the order of Stem), and so the DB found records that were exactly equal in all clauses of its ORDER BY statement, and was therefore allowed to return them in arbitrary order. Which also explains why the sorting tests were failing non-deterministically, because the order of the rows returned by the DB was non-deterministic — probably the order that the rows were stored in the SQLite database, which would depend on a host of factors.

rmunn · 2026-04-01T05:49:56Z

Sena-3 sync test errors are because the predefined morph types that Claude generated include correct names and abbreviations, but do not include descriptions. So when the sena-3 project is synced from FW, it creates 19 changes (when 0 are expected) that just consist of bringing in the descriptions. I'll push a commit that includes descriptions in the predefined data, because if we're matching FW then let's match all of FW.

rmunn · 2026-04-01T06:06:40Z

Note that the descriptions in Sena-3 do not completely match the descriptions found in new FW projects. Specifically, the description for Stem in Sena-3 is a single run, but in new FW projects, the description for Stem includes some runs with the "Emphasized Text" style. The text is the same as far as I can see, it's just that some of the words were emphasized at some point in the past.

Which means that when we sync projects that are not sena-3, we should expect at least one change in morph types, as the .fwdata file will contain a different Stem description (with different numbers of runs) than what I'm currently putting into the CanonicalMorphTypes data. Ways we could handle that:

Include those styled runs in the CanonicalMorphTypes description for Stem, and update our sena-3.fwdata test data to match the description from liblcm's Templates/NewLangProj.fwdata file so that Sena-3 sync tests continue to pass.
Include those styled runs in the CanonicalMorphTypes description for Stem, do not update our sena-3.fwdata test data, and instead have the Sena-3 sync tests expect one change rather than 0 (or have them expect no more than 1 change).
Do not include those styled runs in the CanonicalMorphTypes description for Stem, and don't worry about the fact that every sync of a real project will include a single change to a morph type description (but only once).

rmunn · 2026-04-01T06:51:46Z

I believe the reordering of the entries in the project snapshots may be because .GetEntries() is returning entries in a sort order depending on morph types, and we might just want to reorder the entries in the test data snapshots that we store in the repo to match the new sort order. (Which would show up in this PR because now there are morph types in a CRDT project from the start).

But the change in commit 1e94608 is good enough for now.

Tests are passing; remaining failing check is Argos complaining about the visual change of including morph type prefixes in the UI, and I don't seem to have the access permissions to tell Argos "Yes, that's the new standard of what it should look like". Everything except that is green, so tests are all passing now and PR can be marked ready.

coderabbitai

🧹 Nitpick comments (1)

backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs (1)
11-11: Avoid exposing mutable canonical instances globally.

All is frozen, but each MorphType value is still mutable and shared process-wide. Please expose immutable definitions (or clone on access) so callers cannot mutate canonical state accidentally.

As per coding guidelines "Prefer Records for DTOs and immutable data structures in C# code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs` at line 11, The static
field CanonicalMorphTypes.All currently exposes shared mutable MorphType
instances; update the API so callers cannot mutate canonical state by either
converting MorphType into an immutable type (preferably a record or readonly
struct) and rebuilding CreateAll() to produce those immutable instances, or keep
MorphType as-is but change CanonicalMorphTypes.All to store and return defensive
copies (e.g., expose a method/property that returns clones from CreateAll()/All
instead of the original instances). Locate CreateAll(), the MorphType type
definition, and the All field to implement one of these approaches and ensure
any consumers use the new immutable type or the clone-returning accessor.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs`:
- Line 11: The static field CanonicalMorphTypes.All currently exposes shared
mutable MorphType instances; update the API so callers cannot mutate canonical
state by either converting MorphType into an immutable type (preferably a record
or readonly struct) and rebuilding CreateAll() to produce those immutable
instances, or keep MorphType as-is but change CanonicalMorphTypes.All to store
and return defensive copies (e.g., expose a method/property that returns clones
from CreateAll()/All instead of the original instances). Locate CreateAll(), the
MorphType type definition, and the All field to implement one of these
approaches and ensure any consumers use the new immutable type or the
clone-returning accessor.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: df3c003c-8913-498d-bba3-1e2df8c429be

📥 Commits

Reviewing files that changed from the base of the PR and between e9f7255 and 1e94608.

📒 Files selected for processing (28)

backend/FwLite/FwDataMiniLcmBridge/Api/Sorting.cs
backend/FwLite/FwLiteProjectSync.Tests/FluentAssertGlobalConfig.cs
backend/FwLite/FwLiteProjectSync.Tests/Import/ResumableTests.cs
backend/FwLite/FwLiteProjectSync.Tests/ProjectSnapshotSerializationTests.cs
backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs
backend/FwLite/FwLiteProjectSync.Tests/SyncTests.cs
backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs
backend/FwLite/LcmCrdt.Tests/Changes/ChangeSerializationTests.cs
backend/FwLite/LcmCrdt.Tests/Data/FilteringTests.cs
backend/FwLite/LcmCrdt.Tests/Data/MigrationTests.cs
backend/FwLite/LcmCrdt.Tests/Data/RegressionTestHelper.cs
backend/FwLite/LcmCrdt.Tests/Data/SnapshotDeserializationTests.cs
backend/FwLite/LcmCrdt.Tests/DataModelSnapshotTests.cs
backend/FwLite/LcmCrdt.Tests/FullTextSearch/EntrySearchServiceTests.cs
backend/FwLite/LcmCrdt.Tests/MiniLcmApiFixture.cs
backend/FwLite/LcmCrdt.Tests/MiniLcmTests/SortingTests.cs
backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs
backend/FwLite/LcmCrdt.Tests/OpenProjectTests.cs
backend/FwLite/LcmCrdt/CrdtProjectsService.cs
backend/FwLite/LcmCrdt/CurrentProjectService.cs
backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs
backend/FwLite/LcmCrdt/Migrations/20260318120000_RegenerateSearchTableForMorphTypes.Designer.cs
backend/FwLite/LcmCrdt/Migrations/20260318120000_RegenerateSearchTableForMorphTypes.cs
backend/FwLite/LcmCrdt/Objects/PreDefinedData.cs
backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs
backend/FwLite/MiniLcm.Tests/SortingTestsBase.cs
backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs
backend/FwLite/Taskfile.yml

💤 Files with no reviewable changes (3)

backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs
backend/FwLite/LcmCrdt.Tests/MiniLcmTests/SortingTests.cs
backend/FwLite/MiniLcm.Tests/SortingTestsBase.cs

myieye · 2026-04-01T07:28:32Z

We should have consistent ordering of entries in our snapshots. It's just part of the Get entries API: The result is sorted. I had test failures in the branch I already merged due to sort order and fixed them by commenting out the HomographNumber from fwdata sorting. Because we fall back to sorting by ID, we get consistent sorting in both CRDT and FWDATA even without homographs numbers.

That's likely not all relevant. But, I don't think it's correct to not enforce strict ordering of entries.

I didn't read everything you've written.

rmunn

A few changes needed before we can merge.

backend/FwLite/FwLiteProjectSync.Tests/FluentAssertGlobalConfig.cs

backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs

backend/FwLite/LcmCrdt.Tests/MiniLcmTests/SortingTests.cs

backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs

backend/FwLite/LcmCrdt.Tests/OpenProjectTests.cs

backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs

backend/FwLite/LcmCrdt/Objects/PreDefinedData.cs

backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs

rmunn

Looks good now, all test changes I felt were necessary have now been made.

rmunn · 2026-04-02T03:55:00Z

@CodeRabbit review

coderabbitai · 2026-04-02T03:55:08Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

rmunn · 2026-04-02T04:33:52Z

@CodeRabbit fullreview

coderabbitai · 2026-04-02T04:34:05Z

@rmunn Sure, I'll redo the full review of all the changes in this PR.

🧠 Learnings used

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 2192
File: backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs:125-127
Timestamp: 2026-03-11T04:58:21.676Z
Learning: In sillsdev/languageforge-lexbox backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs SyncInternal, the two-pass sync pattern is intentional and lossless: pass 1 (e.g., line 126 for MorphTypeData) updates the live CRDT from FW data using the project snapshot as "before"; pass 2 (e.g., line 127) reads the *already-updated* live CRDT and pushes it to FW. This means a null value in the CRDT snapshot cannot destructively wipe an FW value — it will be reconciled to the FW value in pass 1 before pass 2 ever runs. A null reaching FW in pass 2 only happens when the user intentionally deleted it in CRDT, which is correct bidirectional sync behavior.

Learnt from: CR
Repo: sillsdev/languageforge-lexbox PR: 0
File: backend/FwLite/AGENTS.md:0-0
Timestamp: 2026-02-04T14:29:56.355Z
Learning: Applies to backend/FwLite/**/*.cs : Before making changes to MiniLcm, LcmCrdt, FwDataMiniLcmBridge, or FwLiteProjectSync: read the relevant section thoroughly, understand the sync flow end-to-end, run the full test suite with `dotnet test FwLiteOnly.slnf`, and test with real FwData projects

Learnt from: CR
Repo: sillsdev/languageforge-lexbox PR: 0
File: backend/FwLite/AGENTS.md:0-0
Timestamp: 2026-02-04T14:29:56.355Z
Learning: Applies to backend/FwLite/**/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs : ProjectSnapshot must be regenerated from CRDT after sync completes. The order of sync operations is critical: WritingSystems → Publications → PartsOfSpeech → SemanticDomains → ComplexFormTypes → Entries (entries sync last as they depend on others). Complex forms sync is two-phase: SyncWithoutComplexFormsAndComponents first, then SyncComplexFormsAndComponents. Ensure `fwdataApi.Save()` is called after changes.

Learnt from: CR
Repo: sillsdev/languageforge-lexbox PR: 0
File: backend/FwLite/AGENTS.md:0-0
Timestamp: 2026-02-04T14:29:56.355Z
Learning: Before making changes to MiniLcm, LcmCrdt, FwDataMiniLcmBridge, or FwLiteProjectSync: read the relevant section thoroughly, understand the sync flow end-to-end, run the full test suite with `dotnet test FwLiteOnly.slnf`, and test with real FwData projects

Learnt from: CR
Repo: sillsdev/languageforge-lexbox PR: 0
File: backend/FwLite/AGENTS.md:0-0
Timestamp: 2026-02-04T14:29:56.355Z
Learning: Applies to backend/FwLite/**/FwLiteProjectSync.Tests/**/*.cs : When fixing a sync bug, reproduce it with a test. Use `DryRunMiniLcmApi` to see what changes would be made. Debug through `CrdtFwdataProjectSyncService.Sync()` and verify `ProjectSnapshot` handling.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 2192
File: backend/FwLite/LcmCrdt/Changes/CreateMorphTypeDataChange.cs:21-32
Timestamp: 2026-03-11T02:23:28.891Z
Learning: In sillsdev/languageforge-lexbox LcmCrdt, the MorphTypeData table's UNIQUE ON CONFLICT IGNORE constraint on the MorphType column is implemented in migration 20260306045826_AddMorphTypeDataTable.cs using migrationBuilder.Sql() with a hand-written CREATE TABLE DDL (since EF Core's CreateTable() API doesn't support ON CONFLICT clauses). A separate migrationBuilder.CreateIndex() for IX_MorphTypeData_MorphType is also emitted so EF Core's model snapshot stays consistent with HasIndex(...).IsUnique() configuration. A separate migration 20260310065457_UniqueIndexOnMorphType (which used CREATE UNIQUE INDEX without ON CONFLICT support) was deleted as part of this consolidation.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 0
File: :0-0
Timestamp: 2026-03-10T08:33:24.170Z
Learning: In backend/FwLite/LcmCrdt, the MorphTypeData table has a unique index on the MorphType column (corresponding to the MorphType C# enum). The intended conflict resolution is "first one wins" using SQLite's ON CONFLICT IGNORE, because: (1) CRDT's NewEntity() cannot return null, so duplicate CreateMorphTypeDataChange objects (from concurrent offline users) result in a second row with DeletedAt set, potentially violating the unique constraint; (2) In practice, users cannot create new morph types in FieldWorks Classic or Lite. SQLite's ON CONFLICT IGNORE clause can only be applied to constraints defined inline in CREATE TABLE (not to separately-created indexes via CREATE UNIQUE INDEX), so implementing this requires either rebuilding the table with an inline UNIQUE ... ON CONFLICT IGNORE constraint or using an EF Core SaveChanges interceptor with INSERT OR IGNORE.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 0
File: :0-0
Timestamp: 2026-03-10T08:50:35.872Z
Learning: In sillsdev/languageforge-lexbox LcmCrdt, when a SQLite table needs `UNIQUE ON CONFLICT IGNORE` on a column (e.g., MorphTypeData.MorphType), the EF Core migration must use `migrationBuilder.Sql()` for the CREATE TABLE DDL (since EF Core's CreateTable() API doesn't support ON CONFLICT clauses). A separate named `CREATE UNIQUE INDEX` should also be emitted via `migrationBuilder.Sql()` so EF Core's model snapshot stays consistent with `HasIndex(...).IsUnique()` configuration. SQLite allows both the inline constraint and the named index on the same column; the inline ON CONFLICT IGNORE fires first on duplicate inserts.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 0
File: :0-0
Timestamp: 2026-03-10T08:34:03.262Z
Learning: In the LcmCrdt SQLite database for sillsdev/languageforge-lexbox, `ON CONFLICT IGNORE` for a unique constraint on MorphTypeData.MorphType cannot be implemented via a standalone `CREATE UNIQUE INDEX` statement (SQLite doesn't support ON CONFLICT clauses on CREATE INDEX). It must be a table-level constraint in the CREATE TABLE DDL, requiring a table-rebuild migration. The practical risk is low because morph type creation is restricted in both FieldWorks Classic and FW Lite, but the theoretical CRDT offline-sync race condition exists.

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs (1)

35-42: ⚠️ Potential issue | 🟡 Minor

Missing seeded Description parity assertion

This test validates most canonical fields but skips Description, so a description-mapping regression can pass unnoticed.

Suggested assertion addition

         foreach (var canonical in CanonicalMorphTypes.All.Values)
         {
             var mt = morphTypes.Should().ContainSingle(m => m.Kind == canonical.Kind).Subject;
             mt.Id.Should().Be(canonical.Id);
             mt.Name["en"].Should().Be(canonical.Name["en"]);
             mt.Abbreviation["en"].Should().Be(canonical.Abbreviation["en"]);
+            mt.Description["en"].Should().BeEquivalentTo(canonical.Description["en"]);
             mt.Prefix.Should().Be(canonical.Prefix);
             mt.Postfix.Should().Be(canonical.Postfix);
             mt.SecondaryOrder.Should().Be(canonical.SecondaryOrder);
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs` around lines 35 - 42,
The test in MorphTypeSeedingTests checks many canonical fields but omits
Description, so add an assertion to compare the seeded morph type's Description
for the same locale; after locating the existing assertions that inspect mt (the
Subject from morphTypes.Should().ContainSingle(m => m.Kind == canonical.Kind)),
add a check that mt.Description["en"].Should().Be(canonical.Description["en"])
(or the appropriate locale key used elsewhere) to ensure description parity
between the seeded mt and canonical.

🧹 Nitpick comments (1)

backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs (1)

114-117: Strengthen canonical parity check with Name and Abbreviation assertions.

The test currently validates IDs/affixes/order, but misses user-visible label drift.

Suggested test hardening

             var canonical = CanonicalMorphTypes.All[fwMorphType.Kind];
             canonical.Id.Should().Be(fwMorphType.Id, $"GUID for {fwMorphType.Kind} should match FwData");
+            canonical.Name["en"].Should().Be(fwMorphType.Name["en"], $"Name for {fwMorphType.Kind} should match FwData");
+            canonical.Abbreviation["en"].Should().Be(fwMorphType.Abbreviation["en"], $"Abbreviation for {fwMorphType.Kind} should match FwData");
             canonical.Prefix.Should().Be(fwMorphType.Prefix, $"Prefix for {fwMorphType.Kind} should match FwData");
             canonical.Postfix.Should().Be(fwMorphType.Postfix, $"Postfix for {fwMorphType.Kind} should match FwData");
             canonical.SecondaryOrder.Should().Be(fwMorphType.SecondaryOrder, $"SecondaryOrder for {fwMorphType.Kind} should match FwData");

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs` around lines 114 -
117, Add assertions to the existing parity checks to validate user-facing labels
by asserting canonical.Name.Should().Be(fwMorphType.Name, $"Name for
{fwMorphType.Kind} should match FwData") and
canonical.Abbreviation.Should().Be(fwMorphType.Abbreviation, $"Abbreviation for
{fwMorphType.Kind} should match FwData"); place these alongside the existing
Id/Prefix/Postfix/SecondaryOrder assertions in the Sena3SyncTests (where
canonical and fwMorphType are compared) so name and abbreviation drift will fail
the test.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/FwLite/LcmCrdt.Tests/Data/RegressionTestHelper.cs`:
- Around line 73-78: The RegressionVersion enum now includes v3 but tests and
test data are not updated; either add a v3.sql test fixture and wiring or remove
the enum. To fix: create the missing backend/FwLite/LcmCrdt.Tests/Data/v3.sql
with the expected scripted DB state, then update MigrationTests.cs to add
[InlineData(RegressionVersion.v3)] to both VerifyAfterMigrationFromScriptedDb
and VerifyRegeneratedSnapshotsAfterMigrationFromScriptedDb, ensuring the test
helpers (e.g., methods that map enum to file names) recognize "v3";
alternatively, if v3 is premature, remove the v3 member from the
RegressionVersion enum to restore consistency.

In `@backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs`:
- Around line 99-103: The test SeedingIsIdempotent in class
MorphTypeSeedingTests doesn't actually re-trigger seeding because MigrationTasks
is static, so either make the test explicitly force a second seed run by
resetting the static state (clear MigrationTasks) before calling MigrateDb a
second time, or rename the test to reflect that it only verifies duplicate-safe
counts after reopen; update the test to call MigrationTasks.Clear() (or the
actual static reset method/property) and then call MigrateDb again to assert
idempotence, or change the test name to e.g.,
SeedingDoesNotCreateDuplicatesOnReopen to match current behavior.

---

Duplicate comments:
In `@backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs`:
- Around line 35-42: The test in MorphTypeSeedingTests checks many canonical
fields but omits Description, so add an assertion to compare the seeded morph
type's Description for the same locale; after locating the existing assertions
that inspect mt (the Subject from morphTypes.Should().ContainSingle(m => m.Kind
== canonical.Kind)), add a check that
mt.Description["en"].Should().Be(canonical.Description["en"]) (or the
appropriate locale key used elsewhere) to ensure description parity between the
seeded mt and canonical.

---

Nitpick comments:
In `@backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs`:
- Around line 114-117: Add assertions to the existing parity checks to validate
user-facing labels by asserting canonical.Name.Should().Be(fwMorphType.Name,
$"Name for {fwMorphType.Kind} should match FwData") and
canonical.Abbreviation.Should().Be(fwMorphType.Abbreviation, $"Abbreviation for
{fwMorphType.Kind} should match FwData"); place these alongside the existing
Id/Prefix/Postfix/SecondaryOrder assertions in the Sena3SyncTests (where
canonical and fwMorphType are compared) so name and abbreviation drift will fail
the test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ba07c2f8-bf11-4a7f-b91f-abb9d34c74b9

📥 Commits

Reviewing files that changed from the base of the PR and between e9f7255 and b780ecc.

📒 Files selected for processing (27)

backend/FwLite/FwDataMiniLcmBridge.Tests/MiniLcmTests/SortingTests.cs
backend/FwLite/FwLiteProjectSync.Tests/FluentAssertGlobalConfig.cs
backend/FwLite/FwLiteProjectSync.Tests/Import/ResumableTests.cs
backend/FwLite/FwLiteProjectSync.Tests/ProjectSnapshotSerializationTests.cs
backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs
backend/FwLite/FwLiteProjectSync.Tests/SyncTests.cs
backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs
backend/FwLite/LcmCrdt.Tests/Changes/ChangeSerializationTests.cs
backend/FwLite/LcmCrdt.Tests/Data/FilteringTests.cs
backend/FwLite/LcmCrdt.Tests/Data/MigrationTests.cs
backend/FwLite/LcmCrdt.Tests/Data/RegressionTestHelper.cs
backend/FwLite/LcmCrdt.Tests/Data/SnapshotDeserializationTests.cs
backend/FwLite/LcmCrdt.Tests/DataModelSnapshotTests.cs
backend/FwLite/LcmCrdt.Tests/FullTextSearch/EntrySearchServiceTests.cs
backend/FwLite/LcmCrdt.Tests/MiniLcmApiFixture.cs
backend/FwLite/LcmCrdt.Tests/MiniLcmTests/SortingTests.cs
backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs
backend/FwLite/LcmCrdt.Tests/OpenProjectTests.cs
backend/FwLite/LcmCrdt/CrdtProjectsService.cs
backend/FwLite/LcmCrdt/CurrentProjectService.cs
backend/FwLite/LcmCrdt/Migrations/20260318120000_RegenerateSearchTableForMorphTypes.Designer.cs
backend/FwLite/LcmCrdt/Migrations/20260318120000_RegenerateSearchTableForMorphTypes.cs
backend/FwLite/LcmCrdt/Objects/PreDefinedData.cs
backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs
backend/FwLite/MiniLcm.Tests/SortingTestsBase.cs
backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs
backend/FwLite/Taskfile.yml

💤 Files with no reviewable changes (1)

backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs

backend/FwLite/LcmCrdt.Tests/Data/RegressionTestHelper.cs

backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs

Project snapshots for CRDT and FW are returning entries in different order, for some not-yet-identified reason. As long as the entries with the same ID compare the same, we should not fail the Sena-3 sync tests just because the snapshots have the entries in a different order. Once the tests pass, I may revert this commit and investigate why the entries in the snapshot are coming out in a different order.

No need to set "WithoutStrictOrderingFor(x -> x.Entries)" if the FW and CRDT APIs are sorting entries the same way. The difference in sorting was because the FW API had been set to sort by HomographNumber, but CRDT has not yet implemented HomographNumber so the sorting was different. Removing the sort by HomographNumber from the FW API results in the tests passing because snapshots are sorted in the same order again.

Test comments should reflect actual SecondaryOrder values from canonical morph types, so that we don't get confused later.

Check that descriptions got created since they're essential to having the correct number of changes in Sena-3 sync tests, and do a bit more to check that the SeedingIsIdempotent_OpeningProjectTwiceDoesNotDuplicate test is actually checking what we think it's checking.

Seeding the morph types does not actually create a change in the serialization format of project snapshots, so we don't actually need a RegressionVersion.v3 and corresponding tests for it.

backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs

myieye · 2026-04-07T15:46:47Z

@coderabbitai full review

coderabbitai · 2026-04-07T15:46:53Z

✅ Actions performed

Full review triggered.

* Seed canonical morph-types into CRDT projects - Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM) - Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes - Seed morph-types for existing projects in MigrateDb (before FTS refresh) - Add EF migration to clear FTS table so headwords are rebuilt with morph tokens - Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates * Stop creating morph-types in tests. They're now prepopulated * Stop printing verify diff content. It's too much. * Seed morph types before API testing * Add descriptions to canonical morph types * Sync morph-types when importing, because they already exist in CRDT * Verify our canonical morph-types match new fwdata projects * Fix non-FTS relevance order with morph-tokens in query

github-actions bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Mar 24, 2026

myieye mentioned this pull request Mar 24, 2026

Handle morph-type tokens: filtering / persistence #2202

Merged

myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from dc45a12 to 823531a Compare March 24, 2026 14:27

myieye force-pushed the claude/fix-morph-type-syncing-bQo8X branch from 46c8f87 to f8f4cfe Compare March 24, 2026 15:39

myieye changed the base branch from claude/add-lexeme-headwords-TowRX to feat/sync-morph-types March 24, 2026 15:40

myieye marked this pull request as ready for review March 24, 2026 15:40

myieye closed this Mar 24, 2026

myieye reopened this Mar 24, 2026

myieye marked this pull request as draft March 27, 2026 08:17

rmunn self-assigned this Mar 31, 2026

rmunn marked this pull request as ready for review April 1, 2026 06:51

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

rmunn requested changes Apr 2, 2026

View reviewed changes

rmunn approved these changes Apr 2, 2026

View reviewed changes

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

backend/FwLite/LcmCrdt.Tests/Data/RegressionTestHelper.cs Show resolved Hide resolved

backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs Outdated Show resolved Hide resolved

myieye force-pushed the feat/sync-morph-types branch from e9f7255 to 6af9827 Compare April 7, 2026 15:30

rmunn added 7 commits April 7, 2026 17:39

Add descriptions to canonical morph types

c87dbfb

Mention correct SecondaryOrder in test comments

7f13150

Test comments should reflect actual SecondaryOrder values from canonical morph types, so that we don't get confused later.

Address my own review comments

925efa8

Remove RegressionVersion value we don't need yet

bcbe1bc

Seeding the morph types does not actually create a change in the serialization format of project snapshots, so we don't actually need a RegressionVersion.v3 and corresponding tests for it.

myieye force-pushed the claude/fix-morph-type-syncing-bQo8X branch from 26cf94c to bcbe1bc Compare April 7, 2026 15:40

myieye commented Apr 7, 2026

View reviewed changes

backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs Outdated Show resolved Hide resolved

myieye added 11 commits April 8, 2026 14:55

Format

d04084f

Update comment

a931a7d

Update morph-type descriptions

fa2d764

Sync morph-types when importing, because they already exist in CRDT

32e39f8

Simplify test

9e753c2

Verify our canonical morph-types match new fwdata projects

b37f16f

Fix verified files

bb92b7c

Remove uninteresting tests

228c153

Minor refactor

1b319ec

Make NewLangProj.fwdata available to test.

73ae741

Fix non-FTS relevance order with morph-tokens in query

8a43b19

myieye merged commit c30abbb into feat/sync-morph-types Apr 9, 2026
13 of 14 checks passed

myieye deleted the claude/fix-morph-type-syncing-bQo8X branch April 9, 2026 07:14

myieye linked an issue Apr 9, 2026 that may be closed by this pull request

Add canonical morph-types and refresh headwords #2212

Closed

Uh oh!

Conversation

myieye commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Implementation Details

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

UI unit Tests

Uh oh!

argos-ci bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmunn commented Mar 31, 2026

Uh oh!

rmunn commented Mar 31, 2026

Uh oh!

rmunn commented Mar 31, 2026

Uh oh!

rmunn commented Apr 1, 2026

Uh oh!

rmunn commented Apr 1, 2026

Uh oh!

rmunn commented Apr 1, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

myieye commented Apr 1, 2026

Uh oh!

rmunn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmunn left a comment

Choose a reason for hiding this comment

Uh oh!

rmunn commented Apr 2, 2026

Uh oh!

coderabbitai bot commented Apr 2, 2026

Uh oh!

rmunn commented Apr 2, 2026

Uh oh!

coderabbitai bot commented Apr 2, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

myieye commented Apr 7, 2026

Uh oh!

coderabbitai bot commented Apr 7, 2026

Uh oh!

Uh oh!

myieye commented Mar 24, 2026 •

edited

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

github-actions bot commented Mar 24, 2026 •

edited

Loading

argos-ci bot commented Mar 24, 2026 •

edited

Loading