[DO NOT MERGE]: HSM Sync Engine by dtkav · Pull Request #69 · No-Instructions/Relay

dtkav · 2026-02-19T19:51:11Z

No description provided.

- Add state vector merge routing tests verifying that stateVectorIsAhead drives correct merge path selection (disk-only, remote-only, three-way) - Remove 3 broken MergeManager tests (createHSM doesn't wire effect subscriptions; MergeHSM has no destroy method) - Delete obsolete state-vector-debug.test.ts (used removed manager.register API) - Update all 25 hibernation tests: documents start warm after notifyHSMCreated, not hibernated. Buffer tests use maxConcurrentWarm:0 and explicit hibernate() to prevent processWakeQueue from draining. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Shadow mode was designed for parallel HSM comparison but was only used in tests, not production. Removing to reduce maintenance burden and dead code. Deleted: - src/merge-hsm/shadow/ (ShadowMergeHSM, ShadowManager, types) - src/merge-hsm/__tests__/shadow.test.ts - Shadow exports from index.ts - Divergence tracking from HSMDebuggerView Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Using path in WRITE_DISK effects was unsafe for renames - if a file was renamed while the effect was in-flight, the write would target the wrong location. Now effects carry the stable guid, and handlers look up the current path at write time. - WriteDiskEffect.path → WriteDiskEffect.guid - DiskIntegration filters by guid, gets path via getter - handleIdleWriteDisk looks up path from guid Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…idge The recording bridge was caching path at recording start, so renamed files showed stale paths in recordings. Now getFullPath() is called at each entry to capture the current path. - getDocument(guid) → getHSM(guid) returns just the HSM - New getFullPath(guid) returns current full vault path - Path is resolved fresh at each timeline entry Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove the legacy LiveEditPlugin (592 lines) which was fully replaced by HSMEditorPlugin for markdown documents. Move the connectionManagerFacet import in ShareLinkPlugin to LiveNodePlugin. Add periodic editor↔CRDT drift detection in CM6Integration that runs every 5 seconds during active.tracking. When drift is found (editor content diverges from localDoc), diagnostics are written to both the relay log and HSM recording file, then the editor is corrected to match the CRDT. Enhanced checkAndCorrectDrift() to accept actual editor text instead of relying on the cached lastKnownEditorText, which can go stale if changes bypass the CM6_CHANGE event path (e.g. Obsidian's metadata renderer calling setViewData directly). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add getConnectionManager() that finds LiveViewManager via app.plugins - Remove StateField, Compartment, reconfigure(), and wipe() machinery - Simplify load() to register extensions once (idempotent) - Remove isLiveMd check and view fallback in HSMEditorPlugin Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Idle merge operations (remote, disk, three-way) previously called transitionTo("idle.synced") directly inside async callbacks, which could complete before the async hash computation — causing the state machine to show idle.synced while work was still in progress. Now, real merge cases send an IDLE_MERGE_COMPLETE event after the async hash completes, with all effects and the state transition happening synchronously after the await. This ensures intermediate states (idle.remoteAhead, idle.diskAhead, idle.diverged) are observable while async work runs. - Add IdleMergeCompleteEvent type and handler - Refactor performIdleRemoteAutoMerge, performIdleDiskAutoMerge, performIdleThreeWayMerge to use event-driven completion - Compute hash before emitting effects to avoid interleaving window - Add serialization/deserialization for the new event type - Fix loadToConflict test helper to pre-compute hash before SET_MODE_IDLE to avoid microtask boundary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add seeded PRNG-driven random delays to test infrastructure to catch timing-related bugs. Controlled via environment variables: - TEST_ASYNC_DELAYS=1 enables random delays (0-10ms) - TEST_SEED=<number> sets reproducible random sequence Changes: - Add random.ts with Mulberry32 PRNG fountain - Add async delays to persistence sync, destroy, and diskLoader - Add sendAcquireLock/sendAcquireLockToTracking helpers that properly await async state transitions - Update loadToIdle to wait for persistence sync before returning - Update loadToConflict to await state after acquireLock - Convert ~30 tests to use async-aware helpers All 89 tests pass with and without TEST_ASYNC_DELAYS=1. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

DRIFT_DETECTED events were using a non-standard format (type/timestamp/state) which broke the visualizer. Now uses standard fields (ns/ts/path/event/from/to). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove enableMergeHSMRecording flag (always install bridge, lightweight without onEntry) - Delete RecordingMergeHSM.ts and generateTest.ts (unused) - Consolidate StreamingEntry into HSMLogEntry - Simplify recording bridge, replay, and serialization modules - Update tests for new recording API Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The relay-live-editor CSS class is added asynchronously after acquireLock(). Previously, edits arriving before initialization were lost. - Don't destroy plugin early if CSS class not yet present - Buffer CM6 changes that arrive before HSM is ready - Replay buffered edits once initialization completes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Validates remote Yjs updates before applying them to prevent corrupted updates from breaking document state. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Disk edits in idle mode now create a fork snapshot of localDoc before ingesting changes. This enables three-way reconciliation when the provider reconnects: fork.base serves as the common ancestor for merging local (disk edit) and remote changes. SyncGate controls CRDT op flow between localDoc and remoteDoc, blocking remote-to-local merges while a fork is unreconciled. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Block syncLocalToRemote() while a fork is unreconciled. This prevents local changes from propagating to remote before fork reconciliation confirms they're safe. When the fork clears (clearForkAndUpdateLCA), pending outbound changes are flushed by recomputing the diff from localDoc to remoteDoc. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When a fork is created in idle mode (disk edit while provider not synced), the HSM emits REQUEST_PROVIDER_SYNC effect. SharedFolder handles this by: 1. Downloading latest state via backgroundSync 2. Applying updates to remoteDoc 3. Sending CONNECTED + PROVIDER_SYNCED to HSM 4. HSM then runs fork reconciliation This ensures disk edits are reconciled promptly without waiting for the user to open the file. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When PROVIDER_SYNCED arrives in active.tracking with a fork present, reconcileForkInActive runs three-way merge and dispatches granular changes to the editor via computeDiffChanges (not full replace). This ensures fork reconciliation happens promptly when the user has the file open, rather than waiting for close/reopen. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fork reconciliation now queries provider.synced via callback instead of storing duplicate state in _syncGate.providerSynced for idle mode. Changes: - Add isProviderSynced callback to MergeHSMConfig - invokeForkReconcile uses callback to check provider state - Add awaitingProvider guard to stay in idle.localAhead until synced - Add hasFork guard to accumulate REMOTE_UPDATE during fork reconciliation - Test harness tracks provider state and passes callback This avoids state ownership issues where the HSM would store providerSynced but not receive DISCONNECTED events in idle mode to clear it. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

destroyLocalDoc() now nulls out references synchronously and does async IDB cleanup on captured refs, preventing races where wake() recreates localDoc while cleanup is still running. wake() and processWakeQueue() call ensureLocalDocForIdle() to recreate localDoc after hibernation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove moduleNameMapper mocks for node-diff3 and y-indexeddb so tests run against real implementations. Add a transform rule for src/*.js files so ts-jest handles the ESM syntax in y-indexeddb.js. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Track OBSIDIAN_FILE_OPENED/OBSIDIAN_FILE_UNLOADED events to maintain an isObsidianFileOpen flag on each HSM. DiskIntegration checks this flag before executing WRITE_DISK effects, blocking writes when the editor has the file open to prevent content duplication. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Simplify invokeForkReconcile to always use diff3 via remoteDoc, removing the "remote unchanged" shortcut that bypassed three-way merge. Reset providerSynced on fork creation and RELEASE_LOCK so reconciliation waits for a fresh sync. Emit REQUEST_PROVIDER_SYNC when releasing lock with an active fork. Add patchLCAHash() to async-compute LCA hashes after reconciliation instead of blocking on the hash computation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously, all active documents skipped remote update injection. Now the skip is gated on enableDirectRemoteUpdates, allowing the enqueueDownload path (which fetches full server state) to work for active documents when direct injection is disabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add ingestDiskToLocalDoc action that applies pending disk contents to localDoc. Change idle.synced DISK_CHANGED transition to re-enter idle.localAhead with ingestion instead of going straight to diverged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The guard checked for sharedFolder.remote during state change callbacks, which is no longer needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ication Adds the OpCapture module (reversible CRDT op capture) and integrates it end-to-end: persistence in IDB via y-indexeddb, test harness support, and consumption during fork reconciliation. When both disk and remote edit the same content, redundant disk ops are reversed; unique disk ops are dropped (kept in CRDT). Also fixes diff3 tokenization to use split(/(\n)/) so adjacent-line changes are handled independently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When a file was deleted or a shared folder was removed from settings, the per-document y-indexeddb databases (and folder-level database) were left behind as orphans in IndexedDB. This adds deleteDatabase calls in deleteFile() and SharedFolders.delete() after the in-memory objects are destroyed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three fixes to the fork→diverged→idle-merge path: 1. clearForkKeepDiverged: clear pendingIdleUpdates (already evaluated by diff3) and restore pendingDiskContents from localDoc so the three-way merge sees the real disk content, not the LCA fallback. 2. invokeIdleThreeWayAutoMerge: read remoteDoc text directly instead of applying pendingIdleUpdates to localDoc via raw Y.applyUpdate. The raw CRDT merge causes interleaving corruption on conflicting edits. If remoteDoc isn't available yet, bail and wait for REMOTE_UPDATE to reenter idle.diverged. 3. clearForkAndUpdateLCA: clear pendingIdleUpdates for hygiene. Also adds: - MERGE_CONFLICT transition in active.tracking for fork conflicts detected during reconcileForkInActive - OBSIDIAN_SAVE_FRONTMATTER / OBSIDIAN_METADATA_SYNC diagnostic events for correlating metadata editor hooks with drift events - Regression test for the corruption scenario Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Path was stored at construction time and never updated on rename, causing stale paths in logging and debug tools. Now MergeHSM takes a getPath callback that computes the current path from the source of truth (Document.path via SharedFolder's files map). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

BackgroundSync.syncDocumentWebsocket only checked userLock before disconnecting, missing cases where the MergeHSM was actively managing the document. Also defers awareness cursor updates in RemoteSelections to avoid re-entrant EditorView.update errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Thin bridge over MergeHSM.awaitState, which is event-driven (subscribes to stateChanges and resolves as soon as its predicate matches) rather than polling. The CDP caller gets event-driven settlement with a single network round-trip instead of N ticks of "read state, check, sleep 50ms". Function-valued predicates don't cross the Python↔JS boundary, so the API takes a string prefix — the common case is "active." or "idle." — and races against a timeout. Rejects with a diagnostic error that includes the current state path. Consumers (RelayClient.await_hsm_state, test scripts) compose action + settlement without baking the wait into editor_open/close, keeping those primitives pure: await client.editor_open(path) await client.await_hsm_state(path, "active.") await client.editor_close() await client.await_hsm_state(path, "idle.")

Exposes conflict inspection and resolution primitives for the test harness and live debugging: - getConflictInfo(path): focused ConflictInfoSnapshot with base, ours, theirs, labels, and a per-hunk ConflictHunkInfo[] carrying each hunk's index, resolved flag, oursContent/theirsContent, and a content-hash `id` (jj-style minimum-unique-prefix over all hunks in the current set). Ids are derived from oursContent + \0 + theirsContent, so they survive re-parses, waits, and persisted fork restoration — but shift if the underlying conflict content changes. - resolveConflict(path, contents): dispatches RESOLVE with final text. - resolveHunk(path, indexOrId, 'local'|'remote'|'both'): dispatches RESOLVE_HUNK. Accepts a numeric index or any unambiguous prefix of the hunk id; throws on ambiguous or unknown ids with a candidate list. - openDiffView(path) / cancelDiffView(path): thin wrappers over the OPEN_DIFF_VIEW and CANCEL events for driving the conflict state machine from tests. - clearLca(path): under-the-hood LCA mutation — reproduces the no-LCA state that arises after upgrading from a plugin version without LCA tracking. All methods go through __relayDebug.lookupDocument and hsm.send, so the state machine drives every transition.

Track live Relay plugin instances on a window-level Set so a crashed or incomplete onunload() surfaces deterministically on the next load instead of silently stacking ghost listeners. onload generates a random instance ID, checks the set for stale entries from a previous lifecycle (logging a loud console.error naming the leaked IDs), and adds its own ID. onunload deletes the instance's ID as the very last step — after auditTeardown and flushLogs — so any earlier throw leaves the ID in place as a tombstone. The next load sees it and warns. Consumers (deploy, RelayClient.verify_build) read the Set directly and assert size === 1. Anything else means the vault is running duplicated subscriptions with racing IDB writes, and tests produce phantom results.

The class accumulated subscriptions in a single flat Unsubscribe[] array without per-folder keying. Every time sharedFolders notified (folder added OR removed), the handler iterated all current folders and pushed 3 fresh subscribes per folder onto the shared array. After N notifications a single folder accumulated 3N subscriptions, and unsubs from removed folders never left the array. On plugin unload, destroy() walked the stale array and called each unsubscriber. Closures from folders whose own Observable had been destroyed (nulling _listeners) crashed with "Cannot read properties of null" in the middle of onunload, leaving the rest of the teardown chain dangling. Replace the flat array with a Map<SharedFolder, Unsubscribe[]> and a single rootSub on the SharedFolders ObservableSet. On every root notification, diff the current folder set against the tracked map: release all subscriptions for removed folders, attach new subscriptions only for newly-added ones. Plugin-global subscriptions (backgroundSync stores, workspace layout) move to a separate globalSubs list that's untouched by folder churn. destroy() releases in the correct order: root first (so no new notifications arrive mid-teardown), per-folder second, global last. Each release path wraps individual unsub calls in try/catch so a pre-destroyed observable doesn't break the rest of the chain.

Observable.destroy() cleared the listener set but never removed the instance from the module-level audit map, so every Observable ever constructed stayed strongly referenced until auditTeardown() ran at plugin unload. It also never cancelled in-flight PostOffice deliveries for its listeners, leaving a ~20ms window in which PostOffice.deliver() could invoke a recipient with a torn-down sender. Delete from the audit map and cancel each listener on Postie before clearing the set.

SharedFolder.connect() called setupEventSubscriptions() again on every reconnect, pushing another handleDocumentUpdateEvent closure onto the provider's eventCallbacks["document.updated"] array. After N reconnects the handler ran N+1 times per server event. The provider preserves eventCallbacks and eventSubscriptions across WebSocket reconnects and re-sends the server subscribe frame itself (client/provider.ts:264-267), and HasProvider.connect() reuses the same _provider instance via refreshProvider. The constructor's setupEventSubscriptions() call is the only registration we need.

super.destroy() already tears down the ydoc via HasProvider.destroyRemoteDoc(). The subsequent this.ydoc.destroy() hit the lazy getter, allocating a fresh Y.Doc, YSweetProvider, websocket, and two connection/state listeners that were then never cleaned up — leaking on every folder removal and plugin unload.

IndexeddbPersistence registers itself as a 'destroy' event handler on the ydoc, so SharedFolder.destroy() does trigger it indirectly via super.destroy() → HasProvider.destroyRemoteDoc → ydoc.destroy(). But the persistence destroy() is async — it awaits _pendingWrites and _pendingCompaction before db.close() — and the returned promise is discarded inside the synchronous event emit. On normal Obsidian unload the OS reaps everything; on plugin reload the new instance can race the old one's pending IDB close on the same database name. Call _persistence.destroy() explicitly before super.destroy() and register the promise with awaitOnReload so plugin re-enable waits for the flush to complete. Calling destroy() synchronously removes its own 'destroy' listener, so the cascade through super.destroy() does not double-fire.

The folder CRDT persistence used the bare folder GUID as the IDB database name. When two vaults share the same folder in one Obsidian process, they shared a single IDB store, leaking Y.Doc updates between vaults. Use `${appId}-relay-folder-${guid}` to match the convention used by every other IDB store in the codebase. Adds a `migrateFrom` option to IndexeddbPersistence that copies raw blobs from the old DB into the new one (without roundtripping through Y.js), then deletes the old DB. Migration runs before `fetchUpdates` so the `synced` event is only emitted after data is safely in the new store. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

initializeWithContent returns false when content was enrolled by the editor (race between uploadDoc and acquireLock). The file still needs syncing and publishing to the syncStore for cross-vault propagation. Also clears pendingUpload in deleteFile.

classifyUpdate deduplicates via tracked state vectors, making the user-based filter redundant. The server can also mislabel the user on the event, making user-based filtering unreliable.

initializeFromRemote now only enrolls CRDT bytes and sets the state vector. LCA is the caller's responsibility via the new setLCA method. downloadDoc calls setLCA after flushing to disk. GUID remap does not set LCA (disk may differ from remote).

baseStart/baseEnd from diff3 are token indices, not line numbers. The line-map lookup produced wrong positions. String search for oursContent in localContent is reliable for unique hunk text.

dtkav and others added 30 commits February 16, 2026 16:03

refactor: Make state machine declarative

a1c57d4

fix: normalize DRIFT_DETECTED log format to match HSMLogEntry

cc11f45

DRIFT_DETECTED events were using a non-standard format (type/timestamp/state) which broke the visualizer. Now uses standard fields (ns/ts/path/event/from/to). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

feat: add Yjs update validation

6359088

Validates remote Yjs updates before applying them to prevent corrupted updates from breaking document state. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix: remove unnecessary destroy-time guard in HSM state subscriber

5c2c08c

The guard checked for sharedFolder.remote during state change callbacks, which is no longer needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dtkav added 3 commits April 9, 2026 16:30

dtkav force-pushed the merge-hsm branch from 2b6eccf to 783da10 Compare April 9, 2026 23:39

dtkav added 6 commits April 9, 2026 16:41

feat: add SharedFolder.onDestroy for external lifetime hooks

ef108d3

dtkav force-pushed the merge-hsm branch from 783da10 to 17db430 Compare April 9, 2026 23:41

mgmobrien and others added 17 commits April 10, 2026 14:24

ci: rename e2e workflow to standard suite (#81)

f7f20d7

ci: force VM harness clone to use deploy key (#82)

e850fe4

ci: pin GitHub host keys in standard workflow (#83)

4f29946

ci: pin GitHub host keys in standard workflow (#84)

c1b7341

ci: add targeted standard-suite reruns (#85)

835e785

ci: raise Linux standard-suite timeout budget (#86)

9f18904

feat: send device ID in token and file-token request bodies

ac67967

fix: set wsconnected=false immediately on provider disconnect

b537d98

fix: reconnect provider in acquireLock after idle fork reconciliation

3422eb2

fix: add destroyed flag to Canvas and SyncFile

4efa0ef

fix: remove user-based echo filter in handleDocumentUpdateEvent

ce9aae1

classifyUpdate deduplicates via tracked state vectors, making the user-based filter redundant. The server can also mislabel the user on the event, making user-based filtering unreliable.

fixup! refactor: simplify initializeFromRemote, add setLCA

75690bd

fix: use string search for conflict hunk positioning

f6e00b3

baseStart/baseEnd from diff3 are token indices, not line numbers. The line-map lookup produced wrong positions. String search for oursContent in localContent is reliable for unique hunk text.

fix: sync lastKnownEditorText after resolveHunk DISPATCH_CM6

69aa55e

dtkav force-pushed the merge-hsm branch from 70ef805 to 69aa55e Compare April 11, 2026 09:27

ci: clear stale repo-local slots before Linux run (#87)

8a811be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE]: HSM Sync Engine#69

[DO NOT MERGE]: HSM Sync Engine#69
dtkav wants to merge 352 commits intomainfrom
merge-hsm

dtkav commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dtkav commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants