Skip to content

Seed canonical morph types and regenerate search index#2219

Merged
myieye merged 28 commits intofeat/sync-morph-typesfrom
claude/fix-morph-type-syncing-bQo8X
Apr 9, 2026
Merged

Seed canonical morph types and regenerate search index#2219
myieye merged 28 commits intofeat/sync-morph-typesfrom
claude/fix-morph-type-syncing-bQo8X

Conversation

@myieye
Copy link
Copy Markdown
Collaborator

@myieye myieye commented Mar 24, 2026

Resolves #2212

Summary

This PR ensures all canonical morph type definitions are seeded into new and existing CRDT projects, and regenerates the full-text search index to include morph-type prefix/postfix tokens in searchable headwords.

Key Changes

  • Added canonical morph type definitions (CanonicalMorphTypes.cs): Defines 20 standard morph types matching FieldWorks/LibLCM specifications with verified GUIDs, names, abbreviations, and affixes. Uses frozen dictionary for immutability and performance.

  • Automatic seeding on project creation and migration: Modified CrdtProjectsService and CurrentProjectService to seed canonical morph types when:

    • Creating a new project with SeedNewProjectData: true
    • Opening an existing project that lacks morph types (via MigrateDb)
  • Search index regeneration: Added migration 20260318120000_RegenerateSearchTableForMorphTypes that clears the EntrySearchRecord table to force FTS rebuild with updated morph-type affixes.

  • Legacy snapshot patching: Updated CrdtFwdataProjectSyncService to patch legacy snapshots missing morph types during sync operations, preventing duplicates when syncing with FwData.

  • Comprehensive test coverage: Added MorphTypeSeedingTests verifying:

    • New projects receive all canonical morph types
    • Existing projects without morph types get seeded on open
    • Seeding is idempotent (no duplicates on repeated opens)
    • All MorphTypeKind enum values are covered

    Added integration test in Sena3SyncTests validating canonical definitions match FwData morph types.

  • Regression test versioning: Updated RegressionTestHelper to version 3 for snapshot compatibility.

Implementation Details

  • Morph types are seeded with fixed GUIDs matching SIL.LCModel constants, ensuring consistency across projects and with FieldWorks data.
  • Seeding uses PreDefinedData.PredefinedMorphTypes() to integrate with existing predefined data infrastructure.
  • Search index regeneration is safe: FTS table is lazily rebuilt on next query if cleared.
  • Sync operations detect and skip seeding if morph types already exist, preventing duplicates in legacy workflows.

https://claude.ai/code/session_01WDKE2vXP4gjMWjfn4cmL4p

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 16bf8ea9-2885-4f4d-9f88-3a4759c06106

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds a canonical registry of morph types (CanonicalMorphTypes), integrates seeding of canonical morph types during project creation and migration, patches legacy snapshots with missing morph-type data from the CRDT, and rebuilds full-text search indexes. Tests are updated to use canonical morph types instead of creating inline instances.

Changes

Cohort / File(s) Summary
Canonical Morph Types Definition
backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs
Added new public class defining a frozen dictionary of canonical MorphType instances keyed by MorphTypeKind, with fixed GUIDs, localized names/abbreviations/descriptions, and affix-specific metadata (prefixes, postfixes, secondary order).
Seeding Integration
backend/FwLite/LcmCrdt/Objects/PreDefinedData.cs, backend/FwLite/LcmCrdt/CrdtProjectsService.cs, backend/FwLite/LcmCrdt/CurrentProjectService.cs, backend/FwLite/LcmCrdt.Tests/MiniLcmApiFixture.cs
Added PredefinedMorphTypes() method to seed canonical morph types into DataModel; integrated seeding into project creation (CrdtProjectsService), project migration (CurrentProjectService), and test fixtures.
Legacy Snapshot Patching
backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs
Added conditional logic to populate empty MorphTypes in legacy snapshots from current CRDT morph types before sync, ensuring pre-morph-type projects are upgraded.
Database Migration
backend/FwLite/LcmCrdt/Migrations/20260318120000_RegenerateSearchTableForMorphTypes.cs, backend/FwLite/LcmCrdt/Migrations/20260318120000_RegenerateSearchTableForMorphTypes.Designer.cs
Added EF Core migration to clear EntrySearchRecord table, forcing full-text search rebuild to incorporate morph-type prefix/postfix tokens; designer file documents target model schema.
Fluent Assert Configuration
backend/FwLite/FwLiteProjectSync.Tests/FluentAssertGlobalConfig.cs, backend/FwLite/LcmCrdt.Tests/Data/MigrationTests.cs
Added module initializer and test configuration to omit exception content during verification assertions, improving test readability.
Morph Type Seeding Test Suite
backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs
Added comprehensive test class validating canonical morph-type seeding during project creation, migration, idempotency, and integrity (unique IDs, required fields, kind coverage).
Integration & Sync Tests
backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs, backend/FwLite/FwLiteProjectSync.Tests/SyncTests.cs, backend/FwLite/FwLiteProjectSync.Tests/Import/ResumableTests.cs
Added tests for canonical morph-type validation against FwData, legacy snapshot sync behavior with empty morph types, and updated existing tests to reference canonical morph types instead of locally constructed instances.
Test Data Updates
backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs, backend/FwLite/MiniLcm.Tests/SortingTestsBase.cs, backend/FwLite/FwDataMiniLcmBridge.Tests/MiniLcmTests/SortingTests.cs, backend/FwLite/LcmCrdt.Tests/MiniLcmTests/SortingTests.cs, backend/FwLite/LcmCrdt.Tests/Data/FilteringTests.cs
Removed inline MorphType creation in tests and updated expected SecondaryOrder values to match canonical definitions (e.g., Unknown: 0, BoundStem: 10, Suffix: 70).
Test Infrastructure
backend/FwLite/LcmCrdt.Tests/OpenProjectTests.cs, backend/FwLite/LcmCrdt.Tests/Data/RegressionTestHelper.cs
Minor test file naming and trailing-comma formatting updates; no behavioral changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested labels

📦 Lexbox

Suggested reviewers

  • rmunn

Poem

🐰 Whiskers twitching with delight,
Canonical types now seeded just right!
From CRDT to search, they hop and they play,
Legacy snapshots—no dupes today! 🌱✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title 'Seed canonical morph types and regenerate search index' directly and accurately summarizes the main changes: seeding morph type definitions and regenerating the search index.
Description check ✅ Passed The pull request description comprehensively explains the changes, key modifications, implementation details, and test coverage related to seeding morph types and search index regeneration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/fix-morph-type-syncing-bQo8X

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Mar 24, 2026
@myieye myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from dc45a12 to 823531a Compare March 24, 2026 14:27
@myieye myieye force-pushed the claude/fix-morph-type-syncing-bQo8X branch from 46c8f87 to f8f4cfe Compare March 24, 2026 15:39
@myieye myieye changed the base branch from claude/add-lexeme-headwords-TowRX to feat/sync-morph-types March 24, 2026 15:40
@myieye myieye marked this pull request as ready for review March 24, 2026 15:40
@myieye myieye closed this Mar 24, 2026
@myieye myieye reopened this Mar 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 24, 2026

UI unit Tests

  1 files  ±0   56 suites  ±0   28s ⏱️ ±0s
166 tests ±0  166 ✅ ±0  0 💤 ±0  0 ❌ ±0 
233 runs  ±0  233 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 8a43b19. ± Comparison against base commit 6af9827.

♻️ This comment has been updated with latest results.

@argos-ci
Copy link
Copy Markdown

argos-ci bot commented Mar 24, 2026

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ⚠️ Changes detected (Review) 6 changed Apr 8, 2026, 3:59 PM

@myieye myieye marked this pull request as draft March 27, 2026 08:17
@rmunn
Copy link
Copy Markdown
Contributor

rmunn commented Mar 31, 2026

Now working on getting tests to pass. Writing down my findings as I go, so this is going to be a little bit stream-of-consciousness.

Sorting order tests are acting non-deterministic, with different sort orders returning on different runs of the same test. I.e., one run of the test might produce the sort order [suffix, root, prefix] and the next run of the same test might produce [suffix, prefix, root].

I'm starting to suspect homograph numbers may be to blame. The MorphTokens_DoNotAffectSortOrder test creates three entries, "aaaa" (a root), "-aaaa" (a suffix), and "aaaa-" (a prefix). Those should not be treated as homographs... but if they are being given homograph numbers, then the order of the homograph numbers may entirely depend on the order in which the homographs are found.

... And I just noticed that the sorting tests are ending up sorting those three words in an order that matches the alphabetical ordering of their GUID. Which would be consistent with the theory that the homograph number is what's being sorted on here.

... But a good argument against that is that the tests that are failing are the LcmCrdt sorting tests, where the "sort by HomographNumber" lines are still commented out! Might have to search for a new theory.

@rmunn rmunn self-assigned this Mar 31, 2026
@rmunn
Copy link
Copy Markdown
Contributor

rmunn commented Mar 31, 2026

It looks like the predefined morph types aren't being seeded in the sorting tests, and that may be the cause of the failures.

@rmunn
Copy link
Copy Markdown
Contributor

rmunn commented Mar 31, 2026

The issue for the sort tests was that the morph types, and other prepopulated data, are populated in CrdtProjectsService.CreateProject, which calls SeedSystemData. But there's a comment in MiniLcmApiFixture explaining that it doesn't call CrdtProjectsService.CreateProject because that ends up opening and closing the DB context (due to the using var db = ... statement in CreateProject), which would end up erasing/resetting the in-memory DB. Instead, MiniLcmApiFixture reproduces the steps that CreateProject goes through, except for seeding initial data such as writing systems.

I added a step to MiniLcmApiFixture to seed morph types (but not other initial data like semantic domains or parts of speech), and now the tests appear to be passing because the sort order logic can actually look up the morph types and get their SecondaryOrder.

The earlier test failures were because the morph types weren't present, so the secondary order was always 0 (the order of Stem), and so the DB found records that were exactly equal in all clauses of its ORDER BY statement, and was therefore allowed to return them in arbitrary order. Which also explains why the sorting tests were failing non-deterministically, because the order of the rows returned by the DB was non-deterministic — probably the order that the rows were stored in the SQLite database, which would depend on a host of factors.

@rmunn
Copy link
Copy Markdown
Contributor

rmunn commented Apr 1, 2026

Sena-3 sync test errors are because the predefined morph types that Claude generated include correct names and abbreviations, but do not include descriptions. So when the sena-3 project is synced from FW, it creates 19 changes (when 0 are expected) that just consist of bringing in the descriptions. I'll push a commit that includes descriptions in the predefined data, because if we're matching FW then let's match all of FW.

@rmunn
Copy link
Copy Markdown
Contributor

rmunn commented Apr 1, 2026

Note that the descriptions in Sena-3 do not completely match the descriptions found in new FW projects. Specifically, the description for Stem in Sena-3 is a single run, but in new FW projects, the description for Stem includes some runs with the "Emphasized Text" style. The text is the same as far as I can see, it's just that some of the words were emphasized at some point in the past.

Which means that when we sync projects that are not sena-3, we should expect at least one change in morph types, as the .fwdata file will contain a different Stem description (with different numbers of runs) than what I'm currently putting into the CanonicalMorphTypes data. Ways we could handle that:

  • Include those styled runs in the CanonicalMorphTypes description for Stem, and update our sena-3.fwdata test data to match the description from liblcm's Templates/NewLangProj.fwdata file so that Sena-3 sync tests continue to pass.
  • Include those styled runs in the CanonicalMorphTypes description for Stem, do not update our sena-3.fwdata test data, and instead have the Sena-3 sync tests expect one change rather than 0 (or have them expect no more than 1 change).
  • Do not include those styled runs in the CanonicalMorphTypes description for Stem, and don't worry about the fact that every sync of a real project will include a single change to a morph type description (but only once).

@rmunn
Copy link
Copy Markdown
Contributor

rmunn commented Apr 1, 2026

I believe the reordering of the entries in the project snapshots may be because .GetEntries() is returning entries in a sort order depending on morph types, and we might just want to reorder the entries in the test data snapshots that we store in the repo to match the new sort order. (Which would show up in this PR because now there are morph types in a CRDT project from the start).

But the change in commit 1e94608 is good enough for now.

Tests are passing; remaining failing check is Argos complaining about the visual change of including morph type prefixes in the UI, and I don't seem to have the access permissions to tell Argos "Yes, that's the new standard of what it should look like". Everything except that is green, so tests are all passing now and PR can be marked ready.

@rmunn rmunn marked this pull request as ready for review April 1, 2026 06:51
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs (1)

11-11: Avoid exposing mutable canonical instances globally.

All is frozen, but each MorphType value is still mutable and shared process-wide. Please expose immutable definitions (or clone on access) so callers cannot mutate canonical state accidentally.

As per coding guidelines "Prefer Records for DTOs and immutable data structures in C# code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs` at line 11, The static
field CanonicalMorphTypes.All currently exposes shared mutable MorphType
instances; update the API so callers cannot mutate canonical state by either
converting MorphType into an immutable type (preferably a record or readonly
struct) and rebuilding CreateAll() to produce those immutable instances, or keep
MorphType as-is but change CanonicalMorphTypes.All to store and return defensive
copies (e.g., expose a method/property that returns clones from CreateAll()/All
instead of the original instances). Locate CreateAll(), the MorphType type
definition, and the All field to implement one of these approaches and ensure
any consumers use the new immutable type or the clone-returning accessor.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs`:
- Line 11: The static field CanonicalMorphTypes.All currently exposes shared
mutable MorphType instances; update the API so callers cannot mutate canonical
state by either converting MorphType into an immutable type (preferably a record
or readonly struct) and rebuilding CreateAll() to produce those immutable
instances, or keep MorphType as-is but change CanonicalMorphTypes.All to store
and return defensive copies (e.g., expose a method/property that returns clones
from CreateAll()/All instead of the original instances). Locate CreateAll(), the
MorphType type definition, and the All field to implement one of these
approaches and ensure any consumers use the new immutable type or the
clone-returning accessor.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: df3c003c-8913-498d-bba3-1e2df8c429be

📥 Commits

Reviewing files that changed from the base of the PR and between e9f7255 and 1e94608.

📒 Files selected for processing (28)
  • backend/FwLite/FwDataMiniLcmBridge/Api/Sorting.cs
  • backend/FwLite/FwLiteProjectSync.Tests/FluentAssertGlobalConfig.cs
  • backend/FwLite/FwLiteProjectSync.Tests/Import/ResumableTests.cs
  • backend/FwLite/FwLiteProjectSync.Tests/ProjectSnapshotSerializationTests.cs
  • backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs
  • backend/FwLite/FwLiteProjectSync.Tests/SyncTests.cs
  • backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs
  • backend/FwLite/LcmCrdt.Tests/Changes/ChangeSerializationTests.cs
  • backend/FwLite/LcmCrdt.Tests/Data/FilteringTests.cs
  • backend/FwLite/LcmCrdt.Tests/Data/MigrationTests.cs
  • backend/FwLite/LcmCrdt.Tests/Data/RegressionTestHelper.cs
  • backend/FwLite/LcmCrdt.Tests/Data/SnapshotDeserializationTests.cs
  • backend/FwLite/LcmCrdt.Tests/DataModelSnapshotTests.cs
  • backend/FwLite/LcmCrdt.Tests/FullTextSearch/EntrySearchServiceTests.cs
  • backend/FwLite/LcmCrdt.Tests/MiniLcmApiFixture.cs
  • backend/FwLite/LcmCrdt.Tests/MiniLcmTests/SortingTests.cs
  • backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs
  • backend/FwLite/LcmCrdt.Tests/OpenProjectTests.cs
  • backend/FwLite/LcmCrdt/CrdtProjectsService.cs
  • backend/FwLite/LcmCrdt/CurrentProjectService.cs
  • backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs
  • backend/FwLite/LcmCrdt/Migrations/20260318120000_RegenerateSearchTableForMorphTypes.Designer.cs
  • backend/FwLite/LcmCrdt/Migrations/20260318120000_RegenerateSearchTableForMorphTypes.cs
  • backend/FwLite/LcmCrdt/Objects/PreDefinedData.cs
  • backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs
  • backend/FwLite/MiniLcm.Tests/SortingTestsBase.cs
  • backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs
  • backend/FwLite/Taskfile.yml
💤 Files with no reviewable changes (3)
  • backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs
  • backend/FwLite/LcmCrdt.Tests/MiniLcmTests/SortingTests.cs
  • backend/FwLite/MiniLcm.Tests/SortingTestsBase.cs

@myieye
Copy link
Copy Markdown
Collaborator Author

myieye commented Apr 1, 2026

We should have consistent ordering of entries in our snapshots. It's just part of the Get entries API: The result is sorted. I had test failures in the branch I already merged due to sort order and fixed them by commenting out the HomographNumber from fwdata sorting. Because we fall back to sorting by ID, we get consistent sorting in both CRDT and FWDATA even without homographs numbers.

That's likely not all relevant. But, I don't think it's correct to not enforce strict ordering of entries.

I didn't read everything you've written.

Copy link
Copy Markdown
Contributor

@rmunn rmunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few changes needed before we can merge.

Copy link
Copy Markdown
Contributor

@rmunn rmunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, all test changes I felt were necessary have now been made.

@rmunn
Copy link
Copy Markdown
Contributor

rmunn commented Apr 2, 2026

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@rmunn
Copy link
Copy Markdown
Contributor

rmunn commented Apr 2, 2026

@CodeRabbit fullreview

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

@rmunn Sure, I'll redo the full review of all the changes in this PR.


🧠 Learnings used
Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 2192
File: backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs:125-127
Timestamp: 2026-03-11T04:58:21.676Z
Learning: In sillsdev/languageforge-lexbox backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs SyncInternal, the two-pass sync pattern is intentional and lossless: pass 1 (e.g., line 126 for MorphTypeData) updates the live CRDT from FW data using the project snapshot as "before"; pass 2 (e.g., line 127) reads the *already-updated* live CRDT and pushes it to FW. This means a null value in the CRDT snapshot cannot destructively wipe an FW value — it will be reconciled to the FW value in pass 1 before pass 2 ever runs. A null reaching FW in pass 2 only happens when the user intentionally deleted it in CRDT, which is correct bidirectional sync behavior.

Learnt from: CR
Repo: sillsdev/languageforge-lexbox PR: 0
File: backend/FwLite/AGENTS.md:0-0
Timestamp: 2026-02-04T14:29:56.355Z
Learning: Applies to backend/FwLite/**/*.cs : Before making changes to MiniLcm, LcmCrdt, FwDataMiniLcmBridge, or FwLiteProjectSync: read the relevant section thoroughly, understand the sync flow end-to-end, run the full test suite with `dotnet test FwLiteOnly.slnf`, and test with real FwData projects

Learnt from: CR
Repo: sillsdev/languageforge-lexbox PR: 0
File: backend/FwLite/AGENTS.md:0-0
Timestamp: 2026-02-04T14:29:56.355Z
Learning: Applies to backend/FwLite/**/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs : ProjectSnapshot must be regenerated from CRDT after sync completes. The order of sync operations is critical: WritingSystems → Publications → PartsOfSpeech → SemanticDomains → ComplexFormTypes → Entries (entries sync last as they depend on others). Complex forms sync is two-phase: SyncWithoutComplexFormsAndComponents first, then SyncComplexFormsAndComponents. Ensure `fwdataApi.Save()` is called after changes.

Learnt from: CR
Repo: sillsdev/languageforge-lexbox PR: 0
File: backend/FwLite/AGENTS.md:0-0
Timestamp: 2026-02-04T14:29:56.355Z
Learning: Before making changes to MiniLcm, LcmCrdt, FwDataMiniLcmBridge, or FwLiteProjectSync: read the relevant section thoroughly, understand the sync flow end-to-end, run the full test suite with `dotnet test FwLiteOnly.slnf`, and test with real FwData projects

Learnt from: CR
Repo: sillsdev/languageforge-lexbox PR: 0
File: backend/FwLite/AGENTS.md:0-0
Timestamp: 2026-02-04T14:29:56.355Z
Learning: Applies to backend/FwLite/**/FwLiteProjectSync.Tests/**/*.cs : When fixing a sync bug, reproduce it with a test. Use `DryRunMiniLcmApi` to see what changes would be made. Debug through `CrdtFwdataProjectSyncService.Sync()` and verify `ProjectSnapshot` handling.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 2192
File: backend/FwLite/LcmCrdt/Changes/CreateMorphTypeDataChange.cs:21-32
Timestamp: 2026-03-11T02:23:28.891Z
Learning: In sillsdev/languageforge-lexbox LcmCrdt, the MorphTypeData table's UNIQUE ON CONFLICT IGNORE constraint on the MorphType column is implemented in migration 20260306045826_AddMorphTypeDataTable.cs using migrationBuilder.Sql() with a hand-written CREATE TABLE DDL (since EF Core's CreateTable() API doesn't support ON CONFLICT clauses). A separate migrationBuilder.CreateIndex() for IX_MorphTypeData_MorphType is also emitted so EF Core's model snapshot stays consistent with HasIndex(...).IsUnique() configuration. A separate migration 20260310065457_UniqueIndexOnMorphType (which used CREATE UNIQUE INDEX without ON CONFLICT support) was deleted as part of this consolidation.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 0
File: :0-0
Timestamp: 2026-03-10T08:33:24.170Z
Learning: In backend/FwLite/LcmCrdt, the MorphTypeData table has a unique index on the MorphType column (corresponding to the MorphType C# enum). The intended conflict resolution is "first one wins" using SQLite's ON CONFLICT IGNORE, because: (1) CRDT's NewEntity() cannot return null, so duplicate CreateMorphTypeDataChange objects (from concurrent offline users) result in a second row with DeletedAt set, potentially violating the unique constraint; (2) In practice, users cannot create new morph types in FieldWorks Classic or Lite. SQLite's ON CONFLICT IGNORE clause can only be applied to constraints defined inline in CREATE TABLE (not to separately-created indexes via CREATE UNIQUE INDEX), so implementing this requires either rebuilding the table with an inline UNIQUE ... ON CONFLICT IGNORE constraint or using an EF Core SaveChanges interceptor with INSERT OR IGNORE.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 0
File: :0-0
Timestamp: 2026-03-10T08:50:35.872Z
Learning: In sillsdev/languageforge-lexbox LcmCrdt, when a SQLite table needs `UNIQUE ON CONFLICT IGNORE` on a column (e.g., MorphTypeData.MorphType), the EF Core migration must use `migrationBuilder.Sql()` for the CREATE TABLE DDL (since EF Core's CreateTable() API doesn't support ON CONFLICT clauses). A separate named `CREATE UNIQUE INDEX` should also be emitted via `migrationBuilder.Sql()` so EF Core's model snapshot stays consistent with `HasIndex(...).IsUnique()` configuration. SQLite allows both the inline constraint and the named index on the same column; the inline ON CONFLICT IGNORE fires first on duplicate inserts.

Learnt from: rmunn
Repo: sillsdev/languageforge-lexbox PR: 0
File: :0-0
Timestamp: 2026-03-10T08:34:03.262Z
Learning: In the LcmCrdt SQLite database for sillsdev/languageforge-lexbox, `ON CONFLICT IGNORE` for a unique constraint on MorphTypeData.MorphType cannot be implemented via a standalone `CREATE UNIQUE INDEX` statement (SQLite doesn't support ON CONFLICT clauses on CREATE INDEX). It must be a table-level constraint in the CREATE TABLE DDL, requiring a table-rebuild migration. The practical risk is low because morph type creation is restricted in both FieldWorks Classic and FW Lite, but the theoretical CRDT offline-sync race condition exists.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs (1)

35-42: ⚠️ Potential issue | 🟡 Minor

Missing seeded Description parity assertion

This test validates most canonical fields but skips Description, so a description-mapping regression can pass unnoticed.

Suggested assertion addition
         foreach (var canonical in CanonicalMorphTypes.All.Values)
         {
             var mt = morphTypes.Should().ContainSingle(m => m.Kind == canonical.Kind).Subject;
             mt.Id.Should().Be(canonical.Id);
             mt.Name["en"].Should().Be(canonical.Name["en"]);
             mt.Abbreviation["en"].Should().Be(canonical.Abbreviation["en"]);
+            mt.Description["en"].Should().BeEquivalentTo(canonical.Description["en"]);
             mt.Prefix.Should().Be(canonical.Prefix);
             mt.Postfix.Should().Be(canonical.Postfix);
             mt.SecondaryOrder.Should().Be(canonical.SecondaryOrder);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs` around lines 35 - 42,
The test in MorphTypeSeedingTests checks many canonical fields but omits
Description, so add an assertion to compare the seeded morph type's Description
for the same locale; after locating the existing assertions that inspect mt (the
Subject from morphTypes.Should().ContainSingle(m => m.Kind == canonical.Kind)),
add a check that mt.Description["en"].Should().Be(canonical.Description["en"])
(or the appropriate locale key used elsewhere) to ensure description parity
between the seeded mt and canonical.
🧹 Nitpick comments (1)
backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs (1)

114-117: Strengthen canonical parity check with Name and Abbreviation assertions.

The test currently validates IDs/affixes/order, but misses user-visible label drift.

Suggested test hardening
             var canonical = CanonicalMorphTypes.All[fwMorphType.Kind];
             canonical.Id.Should().Be(fwMorphType.Id, $"GUID for {fwMorphType.Kind} should match FwData");
+            canonical.Name["en"].Should().Be(fwMorphType.Name["en"], $"Name for {fwMorphType.Kind} should match FwData");
+            canonical.Abbreviation["en"].Should().Be(fwMorphType.Abbreviation["en"], $"Abbreviation for {fwMorphType.Kind} should match FwData");
             canonical.Prefix.Should().Be(fwMorphType.Prefix, $"Prefix for {fwMorphType.Kind} should match FwData");
             canonical.Postfix.Should().Be(fwMorphType.Postfix, $"Postfix for {fwMorphType.Kind} should match FwData");
             canonical.SecondaryOrder.Should().Be(fwMorphType.SecondaryOrder, $"SecondaryOrder for {fwMorphType.Kind} should match FwData");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs` around lines 114 -
117, Add assertions to the existing parity checks to validate user-facing labels
by asserting canonical.Name.Should().Be(fwMorphType.Name, $"Name for
{fwMorphType.Kind} should match FwData") and
canonical.Abbreviation.Should().Be(fwMorphType.Abbreviation, $"Abbreviation for
{fwMorphType.Kind} should match FwData"); place these alongside the existing
Id/Prefix/Postfix/SecondaryOrder assertions in the Sena3SyncTests (where
canonical and fwMorphType are compared) so name and abbreviation drift will fail
the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/FwLite/LcmCrdt.Tests/Data/RegressionTestHelper.cs`:
- Around line 73-78: The RegressionVersion enum now includes v3 but tests and
test data are not updated; either add a v3.sql test fixture and wiring or remove
the enum. To fix: create the missing backend/FwLite/LcmCrdt.Tests/Data/v3.sql
with the expected scripted DB state, then update MigrationTests.cs to add
[InlineData(RegressionVersion.v3)] to both VerifyAfterMigrationFromScriptedDb
and VerifyRegeneratedSnapshotsAfterMigrationFromScriptedDb, ensuring the test
helpers (e.g., methods that map enum to file names) recognize "v3";
alternatively, if v3 is premature, remove the v3 member from the
RegressionVersion enum to restore consistency.

In `@backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs`:
- Around line 99-103: The test SeedingIsIdempotent in class
MorphTypeSeedingTests doesn't actually re-trigger seeding because MigrationTasks
is static, so either make the test explicitly force a second seed run by
resetting the static state (clear MigrationTasks) before calling MigrateDb a
second time, or rename the test to reflect that it only verifies duplicate-safe
counts after reopen; update the test to call MigrationTasks.Clear() (or the
actual static reset method/property) and then call MigrateDb again to assert
idempotence, or change the test name to e.g.,
SeedingDoesNotCreateDuplicatesOnReopen to match current behavior.

---

Duplicate comments:
In `@backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs`:
- Around line 35-42: The test in MorphTypeSeedingTests checks many canonical
fields but omits Description, so add an assertion to compare the seeded morph
type's Description for the same locale; after locating the existing assertions
that inspect mt (the Subject from morphTypes.Should().ContainSingle(m => m.Kind
== canonical.Kind)), add a check that
mt.Description["en"].Should().Be(canonical.Description["en"]) (or the
appropriate locale key used elsewhere) to ensure description parity between the
seeded mt and canonical.

---

Nitpick comments:
In `@backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs`:
- Around line 114-117: Add assertions to the existing parity checks to validate
user-facing labels by asserting canonical.Name.Should().Be(fwMorphType.Name,
$"Name for {fwMorphType.Kind} should match FwData") and
canonical.Abbreviation.Should().Be(fwMorphType.Abbreviation, $"Abbreviation for
{fwMorphType.Kind} should match FwData"); place these alongside the existing
Id/Prefix/Postfix/SecondaryOrder assertions in the Sena3SyncTests (where
canonical and fwMorphType are compared) so name and abbreviation drift will fail
the test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ba07c2f8-bf11-4a7f-b91f-abb9d34c74b9

📥 Commits

Reviewing files that changed from the base of the PR and between e9f7255 and b780ecc.

📒 Files selected for processing (27)
  • backend/FwLite/FwDataMiniLcmBridge.Tests/MiniLcmTests/SortingTests.cs
  • backend/FwLite/FwLiteProjectSync.Tests/FluentAssertGlobalConfig.cs
  • backend/FwLite/FwLiteProjectSync.Tests/Import/ResumableTests.cs
  • backend/FwLite/FwLiteProjectSync.Tests/ProjectSnapshotSerializationTests.cs
  • backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs
  • backend/FwLite/FwLiteProjectSync.Tests/SyncTests.cs
  • backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs
  • backend/FwLite/LcmCrdt.Tests/Changes/ChangeSerializationTests.cs
  • backend/FwLite/LcmCrdt.Tests/Data/FilteringTests.cs
  • backend/FwLite/LcmCrdt.Tests/Data/MigrationTests.cs
  • backend/FwLite/LcmCrdt.Tests/Data/RegressionTestHelper.cs
  • backend/FwLite/LcmCrdt.Tests/Data/SnapshotDeserializationTests.cs
  • backend/FwLite/LcmCrdt.Tests/DataModelSnapshotTests.cs
  • backend/FwLite/LcmCrdt.Tests/FullTextSearch/EntrySearchServiceTests.cs
  • backend/FwLite/LcmCrdt.Tests/MiniLcmApiFixture.cs
  • backend/FwLite/LcmCrdt.Tests/MiniLcmTests/SortingTests.cs
  • backend/FwLite/LcmCrdt.Tests/MorphTypeSeedingTests.cs
  • backend/FwLite/LcmCrdt.Tests/OpenProjectTests.cs
  • backend/FwLite/LcmCrdt/CrdtProjectsService.cs
  • backend/FwLite/LcmCrdt/CurrentProjectService.cs
  • backend/FwLite/LcmCrdt/Migrations/20260318120000_RegenerateSearchTableForMorphTypes.Designer.cs
  • backend/FwLite/LcmCrdt/Migrations/20260318120000_RegenerateSearchTableForMorphTypes.cs
  • backend/FwLite/LcmCrdt/Objects/PreDefinedData.cs
  • backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs
  • backend/FwLite/MiniLcm.Tests/SortingTestsBase.cs
  • backend/FwLite/MiniLcm/Models/CanonicalMorphTypes.cs
  • backend/FwLite/Taskfile.yml
💤 Files with no reviewable changes (1)
  • backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs

@myieye myieye force-pushed the feat/sync-morph-types branch from e9f7255 to 6af9827 Compare April 7, 2026 15:30
rmunn added 7 commits April 7, 2026 17:39
Project snapshots for CRDT and FW are returning entries in different
order, for some not-yet-identified reason. As long as the entries with
the same ID compare the same, we should not fail the Sena-3 sync tests
just because the snapshots have the entries in a different order.

Once the tests pass, I may revert this commit and investigate why the
entries in the snapshot are coming out in a different order.
No need to set "WithoutStrictOrderingFor(x -> x.Entries)" if the FW and
CRDT APIs are sorting entries the same way. The difference in sorting
was because the FW API had been set to sort by HomographNumber, but CRDT
has not yet implemented HomographNumber so the sorting was different.
Removing the sort by HomographNumber from the FW API results in the
tests passing because snapshots are sorted in the same order again.
Test comments should reflect actual SecondaryOrder values from canonical
morph types, so that we don't get confused later.
Check that descriptions got created since they're essential to having
the correct number of changes in Sena-3 sync tests, and do a bit more to
check that the SeedingIsIdempotent_OpeningProjectTwiceDoesNotDuplicate
test is actually checking what we think it's checking.
Seeding the morph types does not actually create a change in the
serialization format of project snapshots, so we don't actually need a
RegressionVersion.v3 and corresponding tests for it.
@myieye myieye force-pushed the claude/fix-morph-type-syncing-bQo8X branch from 26cf94c to bcbe1bc Compare April 7, 2026 15:40
@myieye
Copy link
Copy Markdown
Collaborator Author

myieye commented Apr 7, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

✅ Actions performed

Full review triggered.

@myieye myieye merged commit c30abbb into feat/sync-morph-types Apr 9, 2026
13 of 14 checks passed
@myieye myieye deleted the claude/fix-morph-type-syncing-bQo8X branch April 9, 2026 07:14
@myieye myieye linked an issue Apr 9, 2026 that may be closed by this pull request
myieye added a commit that referenced this pull request Apr 9, 2026
* Seed canonical morph-types into CRDT projects

- Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM)
- Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes
- Seed morph-types for existing projects in MigrateDb (before FTS refresh)
- Add EF migration to clear FTS table so headwords are rebuilt with morph tokens
- Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates

* Stop creating morph-types in tests. They're now prepopulated

* Stop printing verify diff content. It's too much.

* Seed morph types before API testing

* Add descriptions to canonical morph types

* Sync morph-types when importing, because they already exist in CRDT

* Verify our canonical morph-types match new fwdata projects

* Fix non-FTS relevance order with morph-tokens in query
myieye added a commit that referenced this pull request Apr 9, 2026
* Seed canonical morph-types into CRDT projects

- Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM)
- Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes
- Seed morph-types for existing projects in MigrateDb (before FTS refresh)
- Add EF migration to clear FTS table so headwords are rebuilt with morph tokens
- Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates

* Stop creating morph-types in tests. They're now prepopulated

* Stop printing verify diff content. It's too much.

* Seed morph types before API testing

* Add descriptions to canonical morph types

* Sync morph-types when importing, because they already exist in CRDT

* Verify our canonical morph-types match new fwdata projects

* Fix non-FTS relevance order with morph-tokens in query
myieye added a commit that referenced this pull request Apr 9, 2026
* Seed canonical morph-types into CRDT projects

- Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM)
- Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes
- Seed morph-types for existing projects in MigrateDb (before FTS refresh)
- Add EF migration to clear FTS table so headwords are rebuilt with morph tokens
- Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates

* Stop creating morph-types in tests. They're now prepopulated

* Stop printing verify diff content. It's too much.

* Seed morph types before API testing

* Add descriptions to canonical morph types

* Sync morph-types when importing, because they already exist in CRDT

* Verify our canonical morph-types match new fwdata projects

* Fix non-FTS relevance order with morph-tokens in query
myieye added a commit that referenced this pull request Apr 9, 2026
* Seed canonical morph-types into CRDT projects

- Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM)
- Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes
- Seed morph-types for existing projects in MigrateDb (before FTS refresh)
- Add EF migration to clear FTS table so headwords are rebuilt with morph tokens
- Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates

* Stop creating morph-types in tests. They're now prepopulated

* Stop printing verify diff content. It's too much.

* Seed morph types before API testing

* Add descriptions to canonical morph types

* Sync morph-types when importing, because they already exist in CRDT

* Verify our canonical morph-types match new fwdata projects

* Fix non-FTS relevance order with morph-tokens in query
myieye added a commit that referenced this pull request Apr 10, 2026
* Seed canonical morph-types into CRDT projects

- Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM)
- Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes
- Seed morph-types for existing projects in MigrateDb (before FTS refresh)
- Add EF migration to clear FTS table so headwords are rebuilt with morph tokens
- Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates

* Stop creating morph-types in tests. They're now prepopulated

* Stop printing verify diff content. It's too much.

* Seed morph types before API testing

* Add descriptions to canonical morph types

* Sync morph-types when importing, because they already exist in CRDT

* Verify our canonical morph-types match new fwdata projects

* Fix non-FTS relevance order with morph-tokens in query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add canonical morph-types and refresh headwords

3 participants