fix: initialize quorum connections on startup#7240
fix: initialize quorum connections on startup#7240PastaPastaPasta merged 4 commits intodashpay:developfrom
Conversation
After PR dashpay#7063 (commit 1360d9d), CQuorumManager::UpdatedBlockTip was moved from CDSNotificationInterface::UpdatedBlockTip (which is called directly at startup via InitializeCurrentBlockTip) to the ActiveContext/ObserverContext CValidationInterface subscribers. These subscribers only receive UpdatedBlockTip via GetMainSignals() broadcast, which is never triggered at startup on idle chains — ActivateBestChain early-returns when pindexMostWork == m_chain.Tip(). This means that after a full restart with no new blocks: - QuorumObserver::UpdatedBlockTip never fires - CheckQuorumConnections is never called - Quorum connections are never re-established - MNs cannot exchange signing shares - InstantSend and ChainLock signing fails Fix by adding InitializeCurrentBlockTip() to ActiveContext and ObserverContext, called from the loadblk thread right after CDSNotificationInterface::InitializeCurrentBlockTip(). This method also calls the new QuorumObserver::InitializeQuorumConnections(), which bypasses the IsBlockchainSynced() guard (mnsync is not yet complete at that point in the startup sequence). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests that InstantSend works after a full cluster restart without mining any new blocks. This exercises the quorum connection re-establishment path added in the previous commit. The test: - Funds a sender, chainlocks the tip - Restarts all nodes (simple nodes and masternodes) - Reconnects all nodes to node 0 - Bumps mocktime past WAIT_FOR_ISLOCK_TIMEOUT - Sends a transaction and verifies it gets an IS lock Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c25c61a8d9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughAdds explicit initialization entry points for the current block tip: Sequence Diagram(s)sequenceDiagram
participant AppInitMain as AppInitMain (init.cpp)
participant ChainMan as ChainManager
participant ActiveCtx as ActiveContext / ObserverContext
participant QMan as QuorumManager (qman_handler)
participant QObserver as QuorumObserver
AppInitMain->>ChainMan: ActiveTip() / IsInitialBlockDownload()
ChainMan-->>AppInitMain: tip, ibd
AppInitMain->>ActiveCtx: InitializeCurrentBlockTip(tip, ibd)
activate ActiveCtx
ActiveCtx->>ActiveCtx: UpdatedBlockTip(tip, nullptr, ibd)
alt tip != nullptr
ActiveCtx->>QMan: InitializeQuorumConnections(tip)
activate QMan
QMan->>QObserver: InitializeQuorumConnections(tip)
activate QObserver
QObserver->>QObserver: for each llmq: CheckQuorumConnections(params, tip)
deactivate QObserver
deactivate QMan
end
deactivate ActiveCtx
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Clean, well-targeted fix for a real regression where quorum connections were not re-established after a full restart on idle chains. The new InitializeQuorumConnections method correctly bypasses the IsBlockchainSynced() guard, and the functional test directly exercises the fix scenario. No blocking issues found; two minor suggestions for code hygiene and test robustness.
Reviewed commit: c25c61a
🟡 2 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/active/context.cpp`:
- [SUGGESTION] lines 100-106: Identical InitializeCurrentBlockTip bodies in ActiveContext and ObserverContext
The method body is identical in ActiveContext (lines 100-106) and ObserverContext (src/llmq/observer/context.cpp:50-56): both call UpdatedBlockTip then qman_handler->InitializeQuorumConnections. The differentiated behavior comes from two layers: (1) UpdatedBlockTip itself differs — ActiveContext also calls nodeman, ehf_sighandler, gov_signer — and (2) InitializeQuorumConnections dispatches through the virtual CheckQuorumConnections (QuorumParticipant overrides QuorumObserver). Both classes are `final` and share only CValidationInterface as a base, so extracting a common implementation isn't straightforward. Acceptable duplication for now, but worth noting if more initialization steps are added.
In `test/functional/p2p_instantsend.py`:
- [SUGGESTION] line 199: Post-restart IS lock only verified on sender; verifying on receiver would strengthen the test
The pre-restart funding TX verifies IS locks on all nodes (lines 157-158), but the post-restart TX only checks the sender (line 199). Since the purpose of this test is to confirm quorum signing infrastructure works across the cluster after restart, verifying the lock propagates to at least the receiver would provide stronger assurance.
| void ActiveContext::InitializeCurrentBlockTip(const CBlockIndex* tip, bool ibd) | ||
| { | ||
| UpdatedBlockTip(tip, nullptr, ibd); | ||
| if (tip) { | ||
| qman_handler->InitializeQuorumConnections(tip); | ||
| } | ||
| } |
There was a problem hiding this comment.
🟡 Suggestion: Identical InitializeCurrentBlockTip bodies in ActiveContext and ObserverContext
The method body is identical in ActiveContext (lines 100-106) and ObserverContext (src/llmq/observer/context.cpp:50-56): both call UpdatedBlockTip then qman_handler->InitializeQuorumConnections. The differentiated behavior comes from two layers: (1) UpdatedBlockTip itself differs — ActiveContext also calls nodeman, ehf_sighandler, gov_signer — and (2) InitializeQuorumConnections dispatches through the virtual CheckQuorumConnections (QuorumParticipant overrides QuorumObserver). Both classes are final and share only CValidationInterface as a base, so extracting a common implementation isn't straightforward. Acceptable duplication for now, but worth noting if more initialization steps are added.
source: ['claude']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/active/context.cpp`:
- [SUGGESTION] lines 100-106: Identical InitializeCurrentBlockTip bodies in ActiveContext and ObserverContext
The method body is identical in ActiveContext (lines 100-106) and ObserverContext (src/llmq/observer/context.cpp:50-56): both call UpdatedBlockTip then qman_handler->InitializeQuorumConnections. The differentiated behavior comes from two layers: (1) UpdatedBlockTip itself differs — ActiveContext also calls nodeman, ehf_sighandler, gov_signer — and (2) InitializeQuorumConnections dispatches through the virtual CheckQuorumConnections (QuorumParticipant overrides QuorumObserver). Both classes are `final` and share only CValidationInterface as a base, so extracting a common implementation isn't straightforward. Acceptable duplication for now, but worth noting if more initialization steps are added.
Move InitializeCurrentBlockTip for active_ctx (masternode mode) to after nodeman->Init() so that GetProTxHash() is guaranteed to be set before EnsureQuorumConnections runs. Currently this works by accident: UpdatedBlockTip calls Init() internally when the MN state is not READY. But this is a non-obvious side effect and would silently break quorum connections if someone refactored UpdatedBlockTip to add an early return. Also, if IsInitialBlockDownload() is true, UpdatedBlockTip returns early and Init() never runs, leaving a null proTxHash for quorum setup. Making the dependency explicit improves robustness without changing observable behavior in normal operation. Also add a MN-to-MN quorum connection assertion in the restart test to verify that quorum connections form between masternodes after restart, not just that IS locks work (which can succeed via concentrated sigshare sending over any authenticated connection). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
62b970c to
5a43003
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a430037bd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| self.bump_mocktime(30) | ||
| self.sync_mempools() | ||
| for node in self.nodes: | ||
| self.wait_for_instantlock(is_id, node) | ||
| self.log.info("InstantSend lock succeeded after full restart") |
There was a problem hiding this comment.
should drop bump_mocktime/sync-mempools when https://github.com/dashpay/dash/pull/7241/files is merged
| self.log.info("InstantSend lock succeeded after full restart") | ||
|
|
||
| # clean up | ||
| receiver.sendtoaddress(self.nodes[0].getnewaddress(), 0.5, "", "", True) |
There was a problem hiding this comment.
nit: instead "", "" consider using named arguments here
| void QuorumObserver::InitializeQuorumConnections(gsl::not_null<const CBlockIndex*> pindexNew) const | ||
| { | ||
| for (const auto& params : Params().GetConsensus().llmqs) { | ||
| CheckQuorumConnections(params, pindexNew); |
There was a problem hiding this comment.
PR descriptions says that:
This method also calls the new QuorumObserver::InitializeQuorumConnections(), which bypasses the IsBlockchainSynced() guard (mnsync is not yet complete at that point in the startup sequence).
But m_mn_sync.IsBlockchainSynced and m_mn_sync.IsSynced is not the same things.
If blocks are not synced, we should not start masternode connections yet, because internal state is not full enough at this stage yet; isn't it true?
There was a problem hiding this comment.
ThreadOpenMasternodeConnections already has its own IsBlockchainSynced() guard, so even though InitializeQuorumConnections populates the quorum node maps at startup, no actual connections are opened until blockchain sync completes.
For the common case this patch targets — an up-to-date masternode that just restarted on an idle chain — the tip is current, so the quorum node maps are correct by the time ThreadOpenMasternodeConnections acts on them.
For the rare case of a masternode that's far behind, the maps could potentially contain stale entries. There's a brief window where ThreadOpenMasternodeConnections could attempt connections to stale quorum nodes, but a masternode that far behind has bigger problems (missed DKGs, likely PoSe-banned).
There was a problem hiding this comment.
it is still a bit confusing for me, but I guess it's much better than broken develop. I ack'ed it
The merge of dashpay#7240 and dashpay#7241 left two call sites in test_instantsend_after_restart using the old positional signature wait_for_instantlock(txid, node). After dashpay#7241 changed the signature to *txids with keyword-only nodes=, the node object was silently consumed as a second txid, causing the wait to always time out. Use the new consolidated API which handles mempool sync and checks all nodes by default.
…_after_restart c885b55 test: fix wait_for_instantlock calls in test_instantsend_after_restart (PastaClaw) Pull request description: ## Summary Fix two broken `wait_for_instantlock` call sites in `test_instantsend_after_restart` that were missed during the merge of #7240 and #7241. ## Problem The merge of #7240 (added `test_instantsend_after_restart` with old-style calls) and #7241 (refactored `wait_for_instantlock(txid, node)` → `wait_for_instantlock(*txids, nodes=None)`) left two call sites using the old positional signature: ```python for node in self.nodes: self.wait_for_instantlock(txid, node) # node consumed as second *txid ``` The `node` object gets silently swallowed into `*txids` as a second "txid", causing `getrawtransaction(node_object, True)` to fail and the wait to time out after ~282 seconds. This breaks `p2p_instantsend.py` on every CI run. ## Fix Replace both loops with the new consolidated API which handles mempool sync and checks all nodes by default: ```python self.wait_for_instantlock(txid) ``` ACKs for top commit: PastaPastaPasta: ACK c885b55 Tree-SHA512: d53c39491ba67697ece0f00393407a1fdc1d87e3bd495d4370c23491f9acef26c8a57bc9085ac007b994adf7903f89fa65e0962f5cbd71b32d4a8dfa2af45d81
Issue being fixed or feature implemented
After PR #7063 (commit 1360d9d),
CQuorumManager::UpdatedBlockTipwas moved fromCDSNotificationInterface::UpdatedBlockTip(which is called directly at startup viaInitializeCurrentBlockTip) to theActiveContext/ObserverContextCValidationInterfacesubscribers. These subscribers only receiveUpdatedBlockTipviaGetMainSignals()broadcast, which is never triggered at startup on idle chains —ActivateBestChainearly-returns whenpindexMostWork == m_chain.Tip().This means that after a full restart with no new blocks:
QuorumObserver::UpdatedBlockTipnever firesCheckQuorumConnectionsis never calledWhat was done?
Fix by adding
InitializeCurrentBlockTip()toActiveContextandObserverContext, called from theloadblkthread right afterCDSNotificationInterface::InitializeCurrentBlockTip(). This method also calls the newQuorumObserver::InitializeQuorumConnections(), which bypasses theIsBlockchainSynced()guard (mnsync is not yet complete at that point in the startup sequence).How Has This Been Tested?
See new "IS after MNs restart" test which fails on develop and passes with the patch applied
Breaking Changes
n/a
Checklist: