refactor: simplify architecture of quorum observer and quorum participant#7247
refactor: simplify architecture of quorum observer and quorum participant#7247knst wants to merge 16 commits intodashpay:developfrom
Conversation
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThe patch replaces the previous QuorumObserver/QuorumParticipant model with a new llmq::QuorumRole abstraction and a NetQuorum network handler. CQuorumManager no longer depends on observer inheritance and exposes request/validation APIs (RegisterDataRequest, ValidateDataResponse, GetCachedMutableQuorum, WriteContributions). ActiveContext and ObserverContext now implement QuorumRole hooks; many observer/quorum observer sources and headers were removed or consolidated. CDKGSession gained protected visibility and hash-based lookup accessors. Validation/initialization callbacks were added to dispatch an InitializeCurrentBlockTip callback. Build sources and exported headers were updated accordingly. Sequence Diagram(s)sequenceDiagram
participant Chain as Blockchain
participant NetQuorum as llmq::NetQuorum
participant QMan as CQuorumManager
participant Worker as WorkerPool
participant Peers as Network/Connman
participant DB as QuorumDataDB
Chain->>NetQuorum: InitializeCurrentBlockTip(tip, ibd) / UpdatedBlockTip(...)
NetQuorum->>QMan: CheckQuorumConnections(llmqParams, pindexNew)
QMan-->>NetQuorum: quorum list / connection decisions
NetQuorum->>NetQuorum: TriggerQuorumDataRecoveryThreads(quorum)
NetQuorum->>Worker: Start DataRecoveryThread(request missing data)
Worker->>Peers: Send QGETDATA requests
Peers-->>NetQuorum: Receive QDATA responses
NetQuorum->>QMan: ValidateDataResponse(requestKey, response)
alt response OK
NetQuorum->>DB: WriteContributions(quorum)
NetQuorum->>NetQuorum: Update quorum secret share
else mismatch / duplicate
NetQuorum->>Peers: Treat as invalid / ignore
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
src/llmq/dkgsession.h (1)
281-339: Prefer narrower hooks over widening most of the session state toprotected.This trades a small set of explicit friends for subclass access to
invCs, the message maps, and most of the mutable DKG session state. I'd keep as much of this private as possible and add a few targeted hooks/accessors for the remaining subclass needs, so the locking and ordering invariants stay localized.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llmq/dkgsession.h` around lines 281 - 339, Make the large protected block private and provide narrow accessor/hooks for subclasses instead of exposing most session state; keep invCs, cs_pending, and the message maps (contributions, complaints, justifications, prematureCommitments, validCommitments) private and add specific thread-safe methods (e.g., bool TryGetContribution(uint256 hash, CDKGContribution& out), void ForEachContributionLocked(fn), AddComplaint/RemoveComplaint, ValidateAndMarkCommitment, GetPendingContributionVerifications(), etc.) that acquire the proper locks and expose only the minimal operations subclasses need; similarly keep members, membersMap, relayMembers, vvecContribution, receivedVvecs, vecEncryptedContributions, m_sk_contributions, myProTxHash/myId/myIdx private and add targeted getters/setters or iteration hooks (e.g., GetMemberByIndex, FindMemberId, AppendEncryptedContribution) to preserve invariants and localize locking around invCs and cs_pending while removing the broad protected visibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/active/context.cpp`:
- Line 40: The file uses standard algorithms (std::sort and std::ranges::any_of)
but only includes <ranges>, which risks build breakage; add a direct include for
<algorithm> at the top of the file (alongside the existing `#include` <ranges>) so
that usages in functions referencing std::sort and std::ranges::any_of are
guaranteed to compile across STL implementations.
In `@src/llmq/observer.cpp`:
- Around line 35-38: The destructor ObserverContext::~ObserverContext is not
formatted per project style; run clang-format (or apply the repository's
formatting tool) on the hunk containing ObserverContext::~ObserverContext and
the call to m_qman.DisconnectManagers() so the braces, indentation and spacing
match the project's clang-format rules and the diff-format job will pass.
In `@src/llmq/quorumsman.cpp`:
- Around line 772-775: The QuorumRole destructor currently calls Stop()
(QuorumRole::~QuorumRole()) which can run after derived members are destroyed
and allows ActiveContext's workerPool jobs to dereference torn-down derived
state; remove the Stop() call from the base destructor and instead
enforce/verify that shutdown happened earlier by asserting or logging if Stop()
wasn't already invoked. Locate where derived classes or owners create/own
ActiveContext/workerPool and ensure they call Stop() (or otherwise quiesce
workerPool) before derived member teardown; add an assertion in
QuorumRole::~QuorumRole() that Stop() has already been called (or that the
workerPool is quiesced) to fail-fast if shutdown ordering is violated. Ensure
any code paths that currently rely on the base destructor to stop workers are
updated to call Stop() explicitly before destruction.
- Around line 913-918: The loop currently increments nTries before calling
m_qman.IsDataRequestPending and then continues without hitting the 1s sleep,
which burns attempts on pending requests; fix by not incrementing nTries until
you actually make a request: compute pCurrentMemberHash using (start_offset +
nTries) without ++, call m_qman.IsDataRequestPending(*pCurrentMemberHash,...),
and only increment nTries (or advance start_offset) when you proceed to send a
request; alternatively ensure the continue path still triggers the per-iteration
sleep — adjust the logic around pCurrentMemberHash, nTries, and the continue so
IsDataRequestPending does not consume an attempt and the loop still sleeps on
pending checks.
In `@src/llmq/quorumsman.h`:
- Line 75: Add a forward declaration for llmq::QuorumRole before the
CQuorumManager class so the member llmq::QuorumRole* m_handler (and any other
uses of llmq::QuorumRole in CQuorumManager) are valid; specifically, insert
"namespace llmq { class QuorumRole; }" (or equivalent forward declaration) above
the CQuorumManager definition to resolve the use-before-declaration of
llmq::QuorumRole referenced by m_handler and any other members/methods in
CQuorumManager.
---
Nitpick comments:
In `@src/llmq/dkgsession.h`:
- Around line 281-339: Make the large protected block private and provide narrow
accessor/hooks for subclasses instead of exposing most session state; keep
invCs, cs_pending, and the message maps (contributions, complaints,
justifications, prematureCommitments, validCommitments) private and add specific
thread-safe methods (e.g., bool TryGetContribution(uint256 hash,
CDKGContribution& out), void ForEachContributionLocked(fn),
AddComplaint/RemoveComplaint, ValidateAndMarkCommitment,
GetPendingContributionVerifications(), etc.) that acquire the proper locks and
expose only the minimal operations subclasses need; similarly keep members,
membersMap, relayMembers, vvecContribution, receivedVvecs,
vecEncryptedContributions, m_sk_contributions, myProTxHash/myId/myIdx private
and add targeted getters/setters or iteration hooks (e.g., GetMemberByIndex,
FindMemberId, AppendEncryptedContribution) to preserve invariants and localize
locking around invCs and cs_pending while removing the broad protected
visibility.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f51bb0d0-4ebb-4427-8fb1-dc54321440d3
📒 Files selected for processing (20)
src/Makefile.amsrc/active/context.cppsrc/active/context.hsrc/active/dkgsessionhandler.cppsrc/active/quorums.cppsrc/active/quorums.hsrc/init.cppsrc/llmq/dkgsession.cppsrc/llmq/dkgsession.hsrc/llmq/observer.cppsrc/llmq/observer.hsrc/llmq/observer/quorums.cppsrc/llmq/observer/quorums.hsrc/llmq/quorums.hsrc/llmq/quorumsman.cppsrc/llmq/quorumsman.hsrc/net_processing.cppsrc/node/context.cppsrc/rpc/quorums.cpptest/util/data/non-backported.txt
💤 Files with no reviewable changes (5)
- test/util/data/non-backported.txt
- src/active/quorums.cpp
- src/llmq/observer/quorums.cpp
- src/llmq/observer/quorums.h
- src/active/quorums.h
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Clean, well-structured refactoring that successfully collapses the QuorumObserver/QuorumParticipant/QuorumObserverParent hierarchy into a single QuorumRole base class with two concrete subclasses (ObserverContext and ActiveContext). The code movement is mechanically correct with no logic changes, CDKGSession friend removal is properly replaced with getters and protected access, and the destruction/initialization ordering is preserved.
Reviewed commit: 40ab4f9
🟡 2 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/llmq/quorumsman.h`:
- [SUGGESTION] lines 244-249: CheckQuorumConnections and TriggerQuorumDataRecoveryThreads should be protected, not public
These two virtual methods are internal implementation hooks of the Template Method pattern — they're called by `UpdatedBlockTip` and overridden by `ActiveContext`, but never called externally through a `QuorumRole*`. In the old `QuorumObserver` class they were `protected virtual`. Making them public unnecessarily exposes internal state management.
In `ActiveContext::context.h` these overrides are correctly in the `protected` section (line 83–87), which reinforces that this is the intended visibility.
In `src/active/context.cpp`:
- [SUGGESTION] lines 283-286: Worker pool lifetime vs ActiveContext member lifetime during abnormal destruction
The worker pool now lives in the `QuorumRole` base class, while tasks queued via `StartDataRecoveryThread` capture `this` (an `ActiveContext*`) and dereference `nodeman` (line 285). C++ destruction order is: derived destructor → member destruction → base destructor. So if `ActiveContext` is ever destroyed without a prior explicit `Stop()` call, there's a window where `nodeman` is already destroyed but the worker pool hasn't been stopped yet — `QuorumRole::~QuorumRole()` calls `Stop()` too late.
In practice the normal shutdown path (`init.cpp:293`) always calls `Stop()` before destruction, and `QuorumRole::~QuorumRole()` has a safety-net `Stop()` call. But in the old code, the worker pool was inside the owned `qman_handler` member, so it was destroyed in member-destruction order alongside the other members — there was no split between member lifetimes and pool lifetime. This refactor introduced a subtle ordering change that could bite during exception-unwind or future refactors.
Consider adding an `assert` or explicit `Stop()` in `ActiveContext::~ActiveContext()` to make the contract self-enforcing.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 54d9a6b83a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
This pull request has conflicts, please rebase. |
54d9a6b to
323c98a
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/llmq/quorumsman.cpp (2)
772-775:⚠️ Potential issue | 🟠 MajorBase destructor calls
Stop()which may cause UAF issues.The base class destructor (
QuorumRole::~QuorumRole()) callsStop(), but base destructors run after derived member destruction. SinceActiveContextqueuesworkerPooljobs that may reference derived state, this creates a shutdown-time use-after-free window.Consider asserting that
Stop()was already called, rather than calling it from the destructor:QuorumRole::~QuorumRole() { - Stop(); + // Derived classes must call Stop() before destruction + assert(workerPool.size() == 0 || !workerPool.is_running()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llmq/quorumsman.cpp` around lines 772 - 775, The base destructor QuorumRole::~QuorumRole() must not call Stop() because derived state may already be destroyed and ActiveContext/workerPool tasks can reference that derived state; remove the Stop() call from QuorumRole::~QuorumRole() and instead assert that Stop() was already invoked (e.g. check a boolean like m_running or a finished/shutdown flag) to catch incorrect shutdown ordering, or move the Stop() invocation into derived class destructors where derived resources are still valid; reference QuorumRole::~QuorumRole, Stop(), and ActiveContext/workerPool when making the change.
912-922:⚠️ Potential issue | 🟠 MajorRecovery loop burns attempts on pending requests without sleeping.
When
IsDataRequestPending()returns true, the loop incrementsnTriesandcontinues without hitting the 1-second sleep at the bottom. This can rapidly exhaust all members and report "All tried but failed" without actually waiting for pending requests to complete.Consider not incrementing
nTriesuntil a request is actually sent:- if ((GetTime<std::chrono::seconds>().count() - nTimeLastSuccess) > nRequestTimeout) { - if (nTries >= vecMemberHashes.size()) { - printLog("All tried but failed"); - break; - } - pCurrentMemberHash = &vecMemberHashes[(start_offset + nTries++) % vecMemberHashes.size()]; - if (m_qman.IsDataRequestPending(*pCurrentMemberHash, /*we_requested=*/true, pQuorum->qc->quorumHash, - pQuorum->qc->llmqType)) { - printLog("Already asked"); - continue; - } + if ((GetTime<std::chrono::seconds>().count() - nTimeLastSuccess) > nRequestTimeout) { + // Find next member without a pending request + size_t attempts = 0; + while (attempts < vecMemberHashes.size()) { + pCurrentMemberHash = &vecMemberHashes[(start_offset + nTries) % vecMemberHashes.size()]; + if (!m_qman.IsDataRequestPending(*pCurrentMemberHash, /*we_requested=*/true, pQuorum->qc->quorumHash, + pQuorum->qc->llmqType)) { + break; + } + nTries++; + attempts++; + } + if (attempts >= vecMemberHashes.size()) { + printLog("All pending or tried"); + // Don't break - wait for pending to complete + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llmq/quorumsman.cpp` around lines 912 - 922, The loop currently increments nTries when selecting pCurrentMemberHash before calling IsDataRequestPending, which causes pending-request branches to still consume attempts and skip the 1s sleep; change the logic in the recovery loop around IsDataRequestPending so that nTries is only incremented when an actual request is sent (i.e., move the nTries++ to after the code path that sends a request or use a separate index variable for probing and only increment nTries on a real send), keep the IsDataRequestPending check as-is (call on *pCurrentMemberHash), and ensure that when the branch finds a pending request it does not continue past the sleep (so the loop will hit the existing 1-second sleep) and preserve the existing printLog(...) messages (e.g., "Already asked" / "All tried but failed").
🧹 Nitpick comments (1)
src/active/context.cpp (1)
182-182: Redundant assertion ongsl::not_nullparameter.The
block_indexparameter is declared asgsl::not_null<const CBlockIndex*>, which already guarantees the pointer is non-null at the call site. This assertion is unnecessary.Proposed removal
if (request.GetDataMask() & llmq::CQuorumDataRequest::ENCRYPTED_CONTRIBUTIONS) { - assert(block_index); - int memberIdx = quorum.GetMemberIndex(request.GetProTxHash());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/active/context.cpp` at line 182, Remove the redundant runtime assertion for the non-null pointer: delete the call to assert(block_index); in the function that takes gsl::not_null<const CBlockIndex*> block_index (i.e., where the parameter is declared as gsl::not_null<const CBlockIndex*> block_index). The gsl::not_null type already enforces non-null at call sites, so simply remove the assert and leave the parameter usage as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/active/context.cpp`:
- Line 173: The modulo expression "nIndex % quorum.qc->validMembers.size()" can
divide by zero if quorum.qc->validMembers is empty; add a defensive check before
the modulo (inspect quorum.qc and quorum.qc->validMembers.empty()) and handle
the empty case (e.g., return 0 or throw/log a clear error) so you never perform
the modulo on size() == 0; update the function containing that return to use
this guard and keep the rest of the logic unchanged.
---
Duplicate comments:
In `@src/llmq/quorumsman.cpp`:
- Around line 772-775: The base destructor QuorumRole::~QuorumRole() must not
call Stop() because derived state may already be destroyed and
ActiveContext/workerPool tasks can reference that derived state; remove the
Stop() call from QuorumRole::~QuorumRole() and instead assert that Stop() was
already invoked (e.g. check a boolean like m_running or a finished/shutdown
flag) to catch incorrect shutdown ordering, or move the Stop() invocation into
derived class destructors where derived resources are still valid; reference
QuorumRole::~QuorumRole, Stop(), and ActiveContext/workerPool when making the
change.
- Around line 912-922: The loop currently increments nTries when selecting
pCurrentMemberHash before calling IsDataRequestPending, which causes
pending-request branches to still consume attempts and skip the 1s sleep; change
the logic in the recovery loop around IsDataRequestPending so that nTries is
only incremented when an actual request is sent (i.e., move the nTries++ to
after the code path that sends a request or use a separate index variable for
probing and only increment nTries on a real send), keep the IsDataRequestPending
check as-is (call on *pCurrentMemberHash), and ensure that when the branch finds
a pending request it does not continue past the sleep (so the loop will hit the
existing 1-second sleep) and preserve the existing printLog(...) messages (e.g.,
"Already asked" / "All tried but failed").
---
Nitpick comments:
In `@src/active/context.cpp`:
- Line 182: Remove the redundant runtime assertion for the non-null pointer:
delete the call to assert(block_index); in the function that takes
gsl::not_null<const CBlockIndex*> block_index (i.e., where the parameter is
declared as gsl::not_null<const CBlockIndex*> block_index). The gsl::not_null
type already enforces non-null at call sites, so simply remove the assert and
leave the parameter usage as-is.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a10c1fef-7887-4c73-b10b-d5db9bfee1be
📒 Files selected for processing (21)
src/Makefile.amsrc/active/context.cppsrc/active/context.hsrc/active/dkgsession.cppsrc/active/dkgsessionhandler.cppsrc/active/quorums.cppsrc/active/quorums.hsrc/init.cppsrc/llmq/dkgsession.cppsrc/llmq/dkgsession.hsrc/llmq/observer.cppsrc/llmq/observer.hsrc/llmq/observer/quorums.cppsrc/llmq/observer/quorums.hsrc/llmq/quorums.hsrc/llmq/quorumsman.cppsrc/llmq/quorumsman.hsrc/net_processing.cppsrc/node/context.cppsrc/rpc/quorums.cpptest/util/data/non-backported.txt
💤 Files with no reviewable changes (5)
- test/util/data/non-backported.txt
- src/llmq/observer/quorums.cpp
- src/active/quorums.h
- src/active/quorums.cpp
- src/llmq/observer/quorums.h
✅ Files skipped from review due to trivial changes (2)
- src/node/context.cpp
- src/net_processing.cpp
🚧 Files skipped from review as they are similar to previous changes (4)
- src/rpc/quorums.cpp
- src/active/dkgsession.cpp
- src/llmq/quorums.h
- src/init.cpp
src/active/context.cpp
Outdated
| } | ||
| } | ||
| } | ||
| return nIndex % quorum.qc->validMembers.size(); |
There was a problem hiding this comment.
Potential division by zero if validMembers is empty.
If quorum.qc->validMembers.size() is 0, this modulo operation causes undefined behavior. While a properly initialized quorum should always have valid members, adding a defensive check would prevent crashes from edge cases or future refactoring.
Proposed defensive check
- return nIndex % quorum.qc->validMembers.size();
+ const size_t validMembersCount = quorum.qc->validMembers.size();
+ assert(validMembersCount > 0);
+ return nIndex % validMembersCount;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return nIndex % quorum.qc->validMembers.size(); | |
| const size_t validMembersCount = quorum.qc->validMembers.size(); | |
| assert(validMembersCount > 0); | |
| return nIndex % validMembersCount; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/active/context.cpp` at line 173, The modulo expression "nIndex %
quorum.qc->validMembers.size()" can divide by zero if quorum.qc->validMembers is
empty; add a defensive check before the modulo (inspect quorum.qc and
quorum.qc->validMembers.empty()) and handle the empty case (e.g., return 0 or
throw/log a clear error) so you never perform the modulo on size() == 0; update
the function containing that return to use this guard and keep the rest of the
logic unchanged.
There was a problem hiding this comment.
I don't think this is a reachable current-SHA bug. The helper moved to net_quorum.cpp, and its callers return early when quorum.qc->validMembers is empty before computing the recovery offset.
There was a problem hiding this comment.
🧩 Analysis chain
Let me verify the caller guards in net_quorum.cpp.
🏁 Script executed:
#!/bin/bash
# Check callers of GetQuorumRecoveryStartOffset and validMembers guards in net_quorum.cpp
rg -n "GetQuorumRecoveryStartOffset|validMembers" --type cpp -C3
Length of output: 51449
@thepastaclaw, confirmed. Both callers of GetQuorumRecoveryStartOffset in src/llmq/net_quorum.cpp explicitly guard against the empty case before computing the offset:
StartVvecSyncThread(line 587):if (pQuorum->qc->validMembers.empty()) return;StartSkShareRecoveryThread(line 620):if (pQuorum->qc->validMembers.empty()) return;
The modulo-by-zero path is indeed unreachable. Thanks for the clarification — withdrawing the concern.
🧠 Learnings used
Learnt from: UdjinM6
Repo: dashpay/dash PR: 7265
File: src/wallet/wallet.cpp:3381-3382
Timestamp: 2026-04-03T14:17:17.656Z
Learning: In Dash Core, `strprintf()` uses `tinyformat` (type-safe printf). Do not flag format-specifier mismatches for integer types when using `strprintf()`: `%d` is correct for any integer type, including `CAmount`/`int64_t` arguments. Only treat `%d`/integer combinations as a formatting issue if they occur in a non-`strprintf()` (e.g., libc `printf`) context or there is evidence the format string is not being handled by `tinyformat`.
323c98a to
98399b4
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/llmq/quorumsman.h (1)
135-159:⚠️ Potential issue | 🟡 MinorRe-run clang-format on the new declarations.
CI is already red here; these updated declarations and virtual signatures do not match the repository formatting yet.
Also applies to: 235-256
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llmq/quorumsman.h` around lines 135 - 159, Re-run clang-format on the updated declarations in quorumsman.h and fix formatting to match the repository style for the following declarations: RequestQuorumData, GetQuorum, both ScanQuorums overloads, IsMasternode, IsWatching, IsDataRequestPending, GetDataRequestStatus, CleanupExpiredDataRequests, and CleanupOldQuorumData; ensure spacing, line breaks, parameter wrapping and the placement/spacing of the EXCLUSIVE_LOCKS_REQUIRED annotations match existing code style (and reformat the other affected virtual/signature blocks in the file the same way).
♻️ Duplicate comments (4)
src/llmq/quorumsman.cpp (2)
772-775:⚠️ Potential issue | 🟠 MajorDon't make the base destructor the first place that stops the quorum pool.
QuorumRole::~QuorumRole()runs after derived destructors, so it is too late to be the primary shutdown point for queued jobs.Stop()needs to happen beforeObserverContext/ActiveContexttear down their own state; the base destructor should only verify that shutdown already happened.Please verify that derived teardown already stops the shared pool before member destruction and that the shared pool still runs derived work. Expected: an explicit early
Stop()in derived destructors or owners; otherwise this remains a shutdown-time lifetime hazard.#!/bin/bash set -euo pipefail echo "Base shutdown path:" rg -n -C2 'QuorumRole::~QuorumRole|void QuorumRole::Stop|void QuorumRole::Start' src/llmq/quorumsman.cpp echo echo "Derived destructors / explicit Stop() calls:" rg -n -C2 '(ActiveContext|ObserverContext)::(~|Stop\()' src/active src/llmq echo echo "Shared quorum-pool jobs:" rg -n -C3 'workerPool\.push|TriggerQuorumDataRecoveryThreads|StartDataRecoveryThread|StartVvecSyncThread' src/active src/llmq🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llmq/quorumsman.cpp` around lines 772 - 775, The base destructor QuorumRole::~QuorumRole currently calls Stop() too late (after derived destructors run); move the primary shutdown of the shared worker pool out of the base destructor and ensure derived classes (e.g., ObserverContext and ActiveContext) explicitly call Stop() early in their own destructors or owners perform an early Stop() before tearing down derived members; update or add explicit Stop() calls in ActiveContext::~ActiveContext and ObserverContext::~ObserverContext (or their owning shutdown paths) so the workerPool and methods like TriggerQuorumDataRecoveryThreads/StartDataRecoveryThread/StartVvecSyncThread are stopped before derived member destruction, and keep QuorumRole::~QuorumRole as a defensive check that verifies shutdown already occurred rather than performing it.
917-923:⚠️ Potential issue | 🟠 MajorPending requests shouldn't burn recovery attempts.
nTriesis incremented beforeIsDataRequestPending(), and thecontinueskips the normal 1-second backoff. A few outstanding requests can make this loop reach"All tried but failed"without actually giving any of them time to complete.Proposed fix
- pCurrentMemberHash = &vecMemberHashes[(start_offset + nTries++) % vecMemberHashes.size()]; + pCurrentMemberHash = &vecMemberHashes[(start_offset + nTries) % vecMemberHashes.size()]; if (m_qman.IsDataRequestPending(*pCurrentMemberHash, /*we_requested=*/true, pQuorum->qc->quorumHash, pQuorum->qc->llmqType)) { printLog("Already asked"); + quorumThreadInterrupt.sleep_for(std::chrono::seconds(1)); continue; } + ++nTries; quorumThreadInterrupt.sleep_for(std::chrono::milliseconds(start_offset * 100));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llmq/quorumsman.cpp` around lines 917 - 923, The loop currently increments nTries before calling m_qman.IsDataRequestPending and then uses continue, which consumes an attempt without waiting; to fix, compute the candidate member index using a temporary variable (e.g., idx = (start_offset + nTries) % vecMemberHashes.size()) and call IsDataRequestPending with that candidate without incrementing nTries, then only increment nTries when you actually perform an attempt (after passing the pending check) or otherwise perform the 1-second backoff before continuing; adjust quorumThreadInterrupt.sleep_for usage so the pending-case flows through the same backoff path instead of skipping it.src/llmq/quorumsman.h (1)
75-75:⚠️ Potential issue | 🔴 CriticalForward-declare
QuorumRolebeforeCQuorumManager.Line 75 and Line 111 use
llmq::QuorumRolebefore the class is declared at Line 197, so this header is no longer self-contained and can fail to parse when included on its own. Add a forward declaration next to the otherllmqforward declarations beforeCQuorumManager.Minimal fix
class CDKGSessionManager; class CQuorumBlockProcessor; class CQuorumSnapshotManager; +class QuorumRole;Also applies to: 111-111
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llmq/quorumsman.h` at line 75, Add a forward declaration for llmq::QuorumRole before the declaration of CQuorumManager so the header is self-contained: locate the block with the other llmq forward declarations (where CQuorumManager is declared) and add "namespace llmq { class QuorumRole; }" (or equivalent) so members like llmq::QuorumRole* m_handler and any use at lines referencing QuorumRole compile without needing the full class definition.src/active/context.cpp (1)
291-294:⚠️ Potential issue | 🟠 MajorStop the recovery pool before these jobs can outlive
ActiveContext.Lines 291-294 capture
thisand dereferencenodeman, butworkerPoolnow lives inQuorumRole. If destruction ever happens without an earlier explicitStop(), this lambda can run afterActiveContextmembers start tearing down. Make shutdown self-enforcing here by stopping the pool before member destruction, or assert that shutdown already did.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/active/context.cpp` around lines 291 - 294, The lambda pushed into workerPool captures this and dereferences nodeman, which can outlive ActiveContext if the pool is owned by QuorumRole; ensure shutdown is enforced by either stopping the pool before any ActiveContext members are destroyed or asserting shutdown has already occurred. Modify the code around workerPool.push (the lambda that calls DataRecoveryThread with pQuorum/pIndex and nodeman->GetProTxHash()) to first call or verify Stop() (or an equivalent self-enforcing shutdown on workerPool) or add an assertion that ActiveContext has already been stopped, so the lambda cannot run after ActiveContext destruction; update any related lifetimes or comments accordingly to reference workerPool, Stop(), ActiveContext, QuorumRole, nodeman, and DataRecoveryThread.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/active/context.cpp`:
- Around line 153-154: Reformat the function signature and wrapped lines for
ActiveContext::GetQuorumRecoveryStartOffset and the nearby blocks mentioned (the
function headers and wrapped calls/conditionals in the chunks around lines
191-193, 203-205, and 241-330) with clang-format so they match the project's
style; run clang-format (or the repo's formatting command) on
src/active/context.cpp to reflow the long signatures, wrapped calls, and
conditionals and commit the updated formatting ensuring the signatures like
ActiveContext::GetQuorumRecoveryStartOffset are a single consistent style as
enforced by CI.
In `@src/llmq/observer.cpp`:
- Around line 57-62: ObserverContext::InitializeCurrentBlockTip is calling
llmq::QuorumRole::InitializeQuorumConnections(tip) after calling
UpdatedBlockTip(tip, nullptr, ibd), which duplicates the quorum init logic and
bypasses the IBD guard inside QuorumRole::UpdatedBlockTip; remove the explicit
call to llmq::QuorumRole::InitializeQuorumConnections(tip) from
InitializeCurrentBlockTip so quorum connections are only initialized via
UpdatedBlockTip/QuorumRole::UpdatedBlockTip (which already checks ibd) and let
the existing UpdatedBlockTip flow handle connection setup.
- Around line 44-47: The code currently sets ENCRYPTED_CONTRIBUTIONS_MISSING and
returns MisbehavingError when request_limit_exceeded is true, which penalizes
peers for a condition observers can't serve; change the branch in observer.cpp
so that after
request.SetError(CQuorumDataRequest::Errors::ENCRYPTED_CONTRIBUTIONS_MISSING)
you always return MessageProcessingResult{} (no MisbehavingError) regardless of
request_limit_exceeded, matching CQuorumManager::sendQDATA() behavior and
avoiding misbehavior scoring for ENCRYPTED_CONTRIBUTIONS_MISSING.
---
Outside diff comments:
In `@src/llmq/quorumsman.h`:
- Around line 135-159: Re-run clang-format on the updated declarations in
quorumsman.h and fix formatting to match the repository style for the following
declarations: RequestQuorumData, GetQuorum, both ScanQuorums overloads,
IsMasternode, IsWatching, IsDataRequestPending, GetDataRequestStatus,
CleanupExpiredDataRequests, and CleanupOldQuorumData; ensure spacing, line
breaks, parameter wrapping and the placement/spacing of the
EXCLUSIVE_LOCKS_REQUIRED annotations match existing code style (and reformat the
other affected virtual/signature blocks in the file the same way).
---
Duplicate comments:
In `@src/active/context.cpp`:
- Around line 291-294: The lambda pushed into workerPool captures this and
dereferences nodeman, which can outlive ActiveContext if the pool is owned by
QuorumRole; ensure shutdown is enforced by either stopping the pool before any
ActiveContext members are destroyed or asserting shutdown has already occurred.
Modify the code around workerPool.push (the lambda that calls DataRecoveryThread
with pQuorum/pIndex and nodeman->GetProTxHash()) to first call or verify Stop()
(or an equivalent self-enforcing shutdown on workerPool) or add an assertion
that ActiveContext has already been stopped, so the lambda cannot run after
ActiveContext destruction; update any related lifetimes or comments accordingly
to reference workerPool, Stop(), ActiveContext, QuorumRole, nodeman, and
DataRecoveryThread.
In `@src/llmq/quorumsman.cpp`:
- Around line 772-775: The base destructor QuorumRole::~QuorumRole currently
calls Stop() too late (after derived destructors run); move the primary shutdown
of the shared worker pool out of the base destructor and ensure derived classes
(e.g., ObserverContext and ActiveContext) explicitly call Stop() early in their
own destructors or owners perform an early Stop() before tearing down derived
members; update or add explicit Stop() calls in ActiveContext::~ActiveContext
and ObserverContext::~ObserverContext (or their owning shutdown paths) so the
workerPool and methods like
TriggerQuorumDataRecoveryThreads/StartDataRecoveryThread/StartVvecSyncThread are
stopped before derived member destruction, and keep QuorumRole::~QuorumRole as a
defensive check that verifies shutdown already occurred rather than performing
it.
- Around line 917-923: The loop currently increments nTries before calling
m_qman.IsDataRequestPending and then uses continue, which consumes an attempt
without waiting; to fix, compute the candidate member index using a temporary
variable (e.g., idx = (start_offset + nTries) % vecMemberHashes.size()) and call
IsDataRequestPending with that candidate without incrementing nTries, then only
increment nTries when you actually perform an attempt (after passing the pending
check) or otherwise perform the 1-second backoff before continuing; adjust
quorumThreadInterrupt.sleep_for usage so the pending-case flows through the same
backoff path instead of skipping it.
In `@src/llmq/quorumsman.h`:
- Line 75: Add a forward declaration for llmq::QuorumRole before the declaration
of CQuorumManager so the header is self-contained: locate the block with the
other llmq forward declarations (where CQuorumManager is declared) and add
"namespace llmq { class QuorumRole; }" (or equivalent) so members like
llmq::QuorumRole* m_handler and any use at lines referencing QuorumRole compile
without needing the full class definition.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: abc2aee3-2464-4810-95b6-71fe1dde4c97
📒 Files selected for processing (21)
src/Makefile.amsrc/active/context.cppsrc/active/context.hsrc/active/dkgsession.cppsrc/active/dkgsessionhandler.cppsrc/active/quorums.cppsrc/active/quorums.hsrc/init.cppsrc/llmq/dkgsession.cppsrc/llmq/dkgsession.hsrc/llmq/observer.cppsrc/llmq/observer.hsrc/llmq/observer/quorums.cppsrc/llmq/observer/quorums.hsrc/llmq/quorums.hsrc/llmq/quorumsman.cppsrc/llmq/quorumsman.hsrc/net_processing.cppsrc/node/context.cppsrc/rpc/quorums.cpptest/util/data/non-backported.txt
💤 Files with no reviewable changes (5)
- test/util/data/non-backported.txt
- src/llmq/observer/quorums.h
- src/llmq/observer/quorums.cpp
- src/active/quorums.h
- src/active/quorums.cpp
✅ Files skipped from review due to trivial changes (3)
- src/rpc/quorums.cpp
- src/node/context.cpp
- src/net_processing.cpp
🚧 Files skipped from review as they are similar to previous changes (3)
- src/active/dkgsession.cpp
- src/init.cpp
- src/llmq/quorums.h
src/active/context.cpp
Outdated
| size_t ActiveContext::GetQuorumRecoveryStartOffset(const llmq::CQuorum& quorum, | ||
| gsl::not_null<const CBlockIndex*> pIndex) const |
There was a problem hiding this comment.
Re-run clang-format on this block.
The diff-format job is already failing on these signatures, wrapped calls, and conditionals, so this chunk will keep CI red until it is reformatted.
Also applies to: 191-193, 203-205, 241-330
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/active/context.cpp` around lines 153 - 154, Reformat the function
signature and wrapped lines for ActiveContext::GetQuorumRecoveryStartOffset and
the nearby blocks mentioned (the function headers and wrapped calls/conditionals
in the chunks around lines 191-193, 203-205, and 241-330) with clang-format so
they match the project's style; run clang-format (or the repo's formatting
command) on src/active/context.cpp to reflow the long signatures, wrapped calls,
and conditionals and commit the updated formatting ensuring the signatures like
ActiveContext::GetQuorumRecoveryStartOffset are a single consistent style as
enforced by CI.
There was a problem hiding this comment.
This formatting issue is stale on the current SHA. The PR's ClangFormat check passes, so there is no remaining actionable diff-format problem here.
There was a problem hiding this comment.
@thepastaclaw, thanks for the update — acknowledged. If the ClangFormat CI check is passing on the current SHA, there's nothing actionable here. I'll disregard this comment.
🧠 Learnings used
Learnt from: UdjinM6
Repo: dashpay/dash PR: 7265
File: src/wallet/wallet.cpp:3381-3382
Timestamp: 2026-04-03T14:17:17.656Z
Learning: In Dash Core, `strprintf()` uses `tinyformat` (type-safe printf). Do not flag format-specifier mismatches for integer types when using `strprintf()`: `%d` is correct for any integer type, including `CAmount`/`int64_t` arguments. Only treat `%d`/integer combinations as a formatting issue if they occur in a non-`strprintf()` (e.g., libc `printf`) context or there is evidence the format string is not being handled by `tinyformat`.
src/llmq/observer.cpp
Outdated
| if (request.GetDataMask() & CQuorumDataRequest::ENCRYPTED_CONTRIBUTIONS) { | ||
| request.SetError(CQuorumDataRequest::Errors::ENCRYPTED_CONTRIBUTIONS_MISSING); | ||
| return request_limit_exceeded ? MisbehavingError{25, "request limit exceeded"} : MessageProcessingResult{}; | ||
| } |
There was a problem hiding this comment.
Don't punish retries for data observers can never serve.
CQuorumManager::sendQDATA() explicitly skips misbehavior scoring for ENCRYPTED_CONTRIBUTIONS_MISSING, but this branch returns MisbehavingError for the same case when the request repeats. That turns a benign "data not available here" response into a peer penalty.
Proposed fix
if (request.GetDataMask() & CQuorumDataRequest::ENCRYPTED_CONTRIBUTIONS) {
request.SetError(CQuorumDataRequest::Errors::ENCRYPTED_CONTRIBUTIONS_MISSING);
- return request_limit_exceeded ? MisbehavingError{25, "request limit exceeded"} : MessageProcessingResult{};
+ return {};
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/llmq/observer.cpp` around lines 44 - 47, The code currently sets
ENCRYPTED_CONTRIBUTIONS_MISSING and returns MisbehavingError when
request_limit_exceeded is true, which penalizes peers for a condition observers
can't serve; change the branch in observer.cpp so that after
request.SetError(CQuorumDataRequest::Errors::ENCRYPTED_CONTRIBUTIONS_MISSING)
you always return MessageProcessingResult{} (no MisbehavingError) regardless of
request_limit_exceeded, matching CQuorumManager::sendQDATA() behavior and
avoiding misbehavior scoring for ENCRYPTED_CONTRIBUTIONS_MISSING.
There was a problem hiding this comment.
net_quorum.cpp already skips misbehavior for ENCRYPTED_CONTRIBUTIONS_MISSING, isn't it?
There was a problem hiding this comment.
🧩 Analysis chain
Let me check the relevant code in net_quorum.cpp to verify this.
🏁 Script executed:
#!/bin/bash
rg -n "ENCRYPTED_CONTRIBUTIONS" --type=cpp -C 5
Length of output: 9330
@knst — you're right. Looking at the actual net_quorum.cpp code, lines 94–97 explicitly suppress misbehavior for ENCRYPTED_CONTRIBUTIONS_MISSING:
case (CQuorumDataRequest::Errors::QUORUM_VERIFICATION_VECTOR_MISSING):
case (CQuorumDataRequest::Errors::ENCRYPTED_CONTRIBUTIONS_MISSING):
// Do not punish limit exceed if we don't have the requested data
break;The caller in net_quorum.cpp inspects the error code set on the request object via this switch, and since ENCRYPTED_CONTRIBUTIONS_MISSING falls into the break-only branch (leaving misbehave as false), no peer penalty is issued — regardless of what the ProcessContribQGETDATA return value signals. My concern was incorrect. Apologies for the noise.
✏️ Learnings added
Learnt from: knst
Repo: dashpay/dash PR: 7247
File: src/llmq/observer.cpp:0-0
Timestamp: 2026-04-01T13:08:07.207Z
Learning: In `src/llmq/net_quorum.cpp`, the QGETDATA handler has an explicit switch statement that suppresses misbehavior scoring for `CQuorumDataRequest::Errors::ENCRYPTED_CONTRIBUTIONS_MISSING` and `QUORUM_VERIFICATION_VECTOR_MISSING`. Even if `ObserverContext::ProcessContribQGETDATA` in `src/llmq/observer.cpp` returns a `MisbehavingError`, the caller in `net_quorum.cpp` determines misbehavior by inspecting the error code set on the request object, not the return value of the processing method. The switch explicitly does NOT set `misbehave = request_limit_exceeded` for these two error cases.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: knst
Repo: dashpay/dash PR: 7247
File: src/llmq/quorumsman.cpp:0-0
Timestamp: 2026-04-01T12:54:47.684Z
Learning: In `src/llmq/quorumsman.cpp`, the `QuorumRole::DataRecoveryThread` loop intentionally increments `nTries` before calling `m_qman.IsDataRequestPending()` and uses `continue` without a sleep when the request is already pending. This is by design: the loop cycles through `vecMemberHashes` to find a member without an outstanding request, not a bug. This is moved code from the previous `QuorumParticipant`/`QuorumObserver` implementation (PR `#7247`).
There was a problem hiding this comment.
I couldn't confirm a current-SHA issue here. In the refactored path, sendQDATA() still suppresses peer punishment for ENCRYPTED_CONTRIBUTIONS_MISSING, so the observer-side 'data not available here' case is not turned into a misbehavior penalty.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
No further action needed here. This follow-up already acknowledges that the observer-side ENCRYPTED_CONTRIBUTIONS_MISSING path does not actually trigger peer punishment in the current code.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
The incremental push rebases the original refactoring on top of three merged bugfix/test PRs (#7240, #7241, #7252) and adds two new clean refactoring commits: de-duplicating UpdatedBlockTip/InitializeQuorumConnections code and simplifying the observer_ctx guard in init.cpp. The two prior review findings (protected visibility for Template Method hooks, worker pool lifetime safety) remain unaddressed.
Reviewed commit: 98399b4
🟡 2 suggestion(s)
1 additional finding
🟡 suggestion: Worker pool tasks can outlive ActiveContext members on abnormal destruction
src/active/context.cpp (lines 76-80)
Worker pool tasks launched by StartDataRecoveryThread capture this (ActiveContext*) and dereference nodeman->GetProTxHash(). The worker pool Stop() lives in ~QuorumRole(), but C++ destruction order destroys ActiveContext members (including nodeman) before ~QuorumRole() runs. The normal shutdown path calls ActiveContext::Stop() first, but if destruction occurs without an explicit Stop() (e.g., exception during initialization), in-flight tasks could use a destroyed nodeman. Adding QuorumRole::Stop() in ~ActiveContext() would make the contract self-enforcing. (Same finding as prior review — not yet addressed.)
💡 Suggested change
ActiveContext::~ActiveContext()
{
// Ensure worker pool is stopped before our members are destroyed,
// since tasks may capture `this` and dereference nodeman.
llmq::QuorumRole::Stop();
m_qman.DisconnectManagers();
m_isman.DisconnectSigner();
}
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/llmq/quorumsman.h`:
- [SUGGESTION] lines 245-250: CheckQuorumConnections and TriggerQuorumDataRecoveryThreads are still public — should be protected
These two virtual methods are Template Method hooks — they are only ever called from within the QuorumRole hierarchy (by `InitializeQuorumConnections` and `UpdatedBlockTip`). Exposing them as `public` allows external callers to invoke per-LLMQ-type methods directly, bypassing the intended `InitializeQuorumConnections` entry point which iterates all types. In `ActiveContext::context.h` the overrides are correctly declared in the `protected` section, confirming the intended visibility. (Same finding as prior review — not yet addressed.)
In `src/active/context.cpp`:
- [SUGGESTION] lines 76-80: Worker pool tasks can outlive ActiveContext members on abnormal destruction
Worker pool tasks launched by `StartDataRecoveryThread` capture `this` (ActiveContext*) and dereference `nodeman->GetProTxHash()`. The worker pool `Stop()` lives in `~QuorumRole()`, but C++ destruction order destroys ActiveContext members (including `nodeman`) *before* `~QuorumRole()` runs. The normal shutdown path calls `ActiveContext::Stop()` first, but if destruction occurs without an explicit `Stop()` (e.g., exception during initialization), in-flight tasks could use a destroyed `nodeman`. Adding `QuorumRole::Stop()` in `~ActiveContext()` would make the contract self-enforcing. (Same finding as prior review — not yet addressed.)
I have couple changes to add to PR, wait with review |
76e8e3a to
8311e2e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8311e2e592
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/init.cpp
Outdated
| } | ||
| // Note: active_ctx initialization is deferred until after nodeman->Init() | ||
| // so that GetProTxHash() is available for quorum connection setup. | ||
| GetMainSignals().InitializeCurrentBlockTip(tip, ibd); |
There was a problem hiding this comment.
Defer first quorum-tip initialization until proTxHash is set
Calling InitializeCurrentBlockTip here runs NetQuorum::InitializeCurrentBlockTip before nodeman->Init() (which is now done later at lines 2494-2502). In active masternode mode this triggers quorum scanning with a null GetProTxHash(), so CQuorumManager::BuildQuorumContributions cannot set skShare and caches recent quorums without it; subsequent lookups reuse the cached entries and do not rebuild. With -llmq-data-recovery=0 (or if recovery peers are unavailable), the node can stay unable to produce signature shares after restart.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
it looks like existing issue [if there is any], this PR is pure refactoring
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/llmq/quorumsman.h (1)
132-205:⚠️ Potential issue | 🟡 MinorRun clang-format on the new API block.
The diff-format job is still failing on these declarations, enum wrapping, and the
QuorumRoleblock.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llmq/quorumsman.h` around lines 132 - 205, Run clang-format on src/llmq/quorumsman.h to fix the formatting in the newly added API block: reformat the declarations for RegisterDataRequest, ValidateDataResponse, IsDataRequestPending, GetDataRequestStatus, CleanupExpiredDataRequests, CleanupOldQuorumData, GetCachedMutableQuorum, WriteContributions, QueueQuorumForWarming, BuildQuorumFromCommitment, GetQuorum, CacheWarmingThreadMain, MigrateOldQuorumDB, the SIGN_HEIGHT_OFFSET line, SelectQuorumForSigning, VerifyRecoveredSig, and the QuorumRole class so they follow the project's style (line wrapping, enum class/DataResponseValidation formatting, spacing and brace placement); simply run the repository's clang-format configuration (or your editor's clang-format) and commit the resulting whitespace-only changes.
♻️ Duplicate comments (3)
src/llmq/net_quorum.cpp (2)
321-329:⚠️ Potential issue | 🟠 MajorAvoid the unconditional second quorum-connection pass.
UpdatedBlockTip(tip, nullptr, ibd)already runs the connection scan when the node is ready. The extra loop repeats that work on synced restarts, and it still opens quorum connections during IBD.Suggested fix
void NetQuorum::InitializeCurrentBlockTip(const CBlockIndex* tip, bool ibd) { if (!m_role) return; UpdatedBlockTip(tip, nullptr, ibd); - if (tip) { + if (tip && !ibd && !m_mn_sync.IsBlockchainSynced()) { for (const auto& params : Params().GetConsensus().llmqs) { CheckQuorumConnections(params, tip); } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llmq/net_quorum.cpp` around lines 321 - 329, NetQuorum::InitializeCurrentBlockTip currently calls UpdatedBlockTip(tip, nullptr, ibd) and then unconditionally repeats a quorum-connection scan with a loop over Params().GetConsensus().llmqs calling CheckQuorumConnections(params, tip); remove that duplicate pass: delete the for-loop (or at minimum guard it so it does not run during IBD and does not run when UpdatedBlockTip already performed the scan) so quorum connections are not repeated or opened during initial block download; keep UpdatedBlockTip, tip and ibd handling intact.
145-160:⚠️ Potential issue | 🟠 MajorDon't score repeated
ENCRYPTED_CONTRIBUTIONS_MISSINGreplies.
sendQDATA()explicitly suppresses scoring forENCRYPTED_CONTRIBUTIONS_MISSING, butmisbehave_contribturns it back on in the final limit-exceeded check. Observers hit this path on every contribution request becausem_nodemanis null.Suggested fix
- bool misbehave_contrib = ProcessContribQGETDATA(ssResponseData, *pQuorum, request, pQuorumBaseBlockIndex); + ProcessContribQGETDATA(ssResponseData, *pQuorum, request, pQuorumBaseBlockIndex); @@ - if (request_limit_exceeded && (misbehave_contrib || misbehave_qdata)) { + if (request_limit_exceeded && misbehave_qdata) { m_peer_manager->PeerMisbehaving(pfrom.GetId(), 25, "request limit exceeded"); }Also applies to: 233-248
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llmq/net_quorum.cpp` around lines 145 - 160, The combined misbehaviour check incorrectly re-enables scoring for ENCRYPTED_CONTRIBUTIONS_MISSING because misbehave_contrib (from ProcessContribQGETDATA) can be true even when sendQDATA suppressed scoring; update the limit-exceeded branch to ignore misbehave_contrib when the effective error is CQuorumDataRequest::Errors::ENCRYPTED_CONTRIBUTIONS_MISSING (or when request.GetError()/ret_err equals that value) so that (misbehave_contrib || misbehave_qdata) only triggers PeerMisbehaving if the error is not ENCRYPTED_CONTRIBUTIONS_MISSING; adjust the same logic in the other occurrence (around the block that includes ProcessContribQGETDATA, sendQDATA, request_limit_exceeded and m_peer_manager->PeerMisbehaving) to ensure ENCRYPTED_CONTRIBUTIONS_MISSING never causes PeerMisbehaving.src/llmq/quorumsman.h (1)
53-63:⚠️ Potential issue | 🔴 CriticalForward-declare
QuorumRolebeforeCQuorumManager.This header stores
llmq::QuorumRole*on Line 62 and accepts it inConnectManagers()on Line 98, but the class is only declared on Line 189. That makesquorumsman.hdepend on include order instead of being self-contained.Minimal fix
namespace llmq { +class QuorumRole; + enum class VerifyRecSigStatus : uint8_t {#!/bin/bash set -euo pipefail rg -n 'class QuorumRole|m_handler|ConnectManagers' src/llmq/quorumsman.h sed -n '38,110p' src/llmq/quorumsman.h sed -n '185,205p' src/llmq/quorumsman.hExpected result: the first local
class QuorumRoleappears after them_handler/ConnectManagersuses, confirming the need for a forward declaration in this header.Also applies to: 98-105
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llmq/quorumsman.h` around lines 53 - 63, CQuorumManager.h currently uses the llmq::QuorumRole pointer (m_handler) and accepts a QuorumRole in ConnectManagers() before the QuorumRole class is declared, creating an include-order dependency; fix this by adding a forward declaration for namespace llmq { class QuorumRole; } near the top of quorumsman.h (before the CQuorumManager declaration) so CQuorumManager, its member m_handler and its ConnectManagers() declaration compile without requiring the full QuorumRole definition.
🧹 Nitpick comments (1)
src/llmq/net_quorum.h (1)
45-46: Mention the validation-interface registration in this ownership note.
PeerManagerImpl::AddExtraHandler()also exposesNetQuorumthroughRegisterValidationInterface(), so this object has an extra non-owning reference path untilStopHandlers()unregisters it. Calling that out here would make the lifetime contract aroundm_roleandm_nodemanmuch clearer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llmq/net_quorum.h` around lines 45 - 46, Update the ownership comment for NetQuorum to note that although it is owned exclusively by PeerManagerImpl via AddExtraHandler, it is also exposed via RegisterValidationInterface by PeerManagerImpl::AddExtraHandler, creating an additional non-owning reference path until StopHandlers() calls unregister; mention this to clarify the lifetime guarantees around m_role and m_nodeman and that they remain valid until the validation interface is unregistered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/active/dkgsessionhandler.cpp`:
- Around line 108-113: The LogPrintf calls in ActiveDKGSessionHandler (using
__func__, pQuorumBaseBlockIndex->nHeight, params.name, quorumIndex) are
misformatted; re-run clang-format or manually reflow the argument list so the
lines match the project's formatting rules (wrap/align arguments consistently),
ensuring both the "quorum initialization failed" and "quorum initialization OK"
LogPrintf invocations are formatted identically and pass the diff-format check.
In `@src/init.cpp`:
- Around line 2427-2430: Drain the validation callback queue before broadcasting
the current tip: after obtaining tip and ibd (the lines using
WITH_LOCK(::cs_main) and chainman.ActiveChainstate().IsInitialBlockDownload()),
call SyncWithValidationInterfaceQueue() to ensure any
BlockConnected/UpdatedBlockTip work left by ThreadImport() has been processed,
then call GetMainSignals().InitializeCurrentBlockTip(tip, ibd); this guarantees
subscribers see the final tip after older callbacks have run.
In `@src/llmq/dkgsession.h`:
- Around line 396-400: The new accessor declarations (GetContribution,
GetComplaint, GetJustification, GetPrematureCommitment) are not formatted to
project style; run clang-format or reflow these declarations so line-wrapping
and spacing match the repo's clang-format rules (preserving the signature and
the EXCLUSIVE_LOCKS_REQUIRED(!invCs) attribute and return types) to resolve the
diff-format failure.
In `@src/llmq/net_quorum.cpp`:
- Around line 31-706: The file src/llmq/net_quorum.cpp has formatting issues;
run the project's clang-format configuration on this file (e.g. clang-format -i
--style=file src/llmq/net_quorum.cpp) to normalize spacing/indentation and line
breaks across all functions (NetQuorum::NetQuorum, NetQuorum::Start,
NetQuorum::ProcessMessage, ProcessContribQGETDATA, ProcessContribQDATA,
DataRecoveryThread, StartVvecSyncThread, StartSkShareRecoveryThread,
StartCleanupOldQuorumDataThread, etc.), commit the formatted file, and re-run
the diff-format job to ensure the CI/lint errors are resolved.
- Around line 72-75: The QGETDATA guard currently rejects when m_role is null
(observer/watch mode) causing qwatch connections to be mis-scored; change the
conditional logic in the NetMsgType::QGETDATA handling so that qwatch peers are
allowed to proceed regardless of m_role (i.e. only enforce the
m_role/IsMasternode/GetVerifiedProRegTxHash checks for non-qwatch connections).
Update the condition around m_peer_manager->PeerMisbehaving call (referencing
NetMsgType::QGETDATA, m_role, IsMasternode(), pfrom.GetVerifiedProRegTxHash(),
and pfrom.qwatch) so watch-only observers are not penalized; leave the existing
m_nodeman nullptr handling further down unchanged.
In `@src/llmq/net_quorum.h`:
- Around line 48-116: The NetQuorum class declaration has inconsistent
wrapping/indentation (clang-format errors); run clang-format over this file and
reformat the NetQuorum declaration and its member function/type lines (including
the constructor NetQuorum(...), overridden NetHandler methods
ProcessMessage/Start/Stop/Interrupt, CValidationInterface methods
InitializeCurrentBlockTip/UpdatedBlockTip, and private helpers like
RequestQuorumData, GetQuorumsToDelete, CheckQuorumConnections,
TriggerQuorumDataRecoveryThreads, DataRecoveryThread, StartVvecSyncThread,
TryStartVvecSyncThread, StartSkShareRecoveryThread,
GetQuorumRecoveryStartOffset, StartCleanupOldQuorumDataThread,
ProcessContribQGETDATA, ProcessContribQDATA, and all member variable
declarations) so wrapping, alignment, and indentation match the project's
clang-format style; then re-run the CI formatting check to ensure the
diff-format job passes.
In `@src/llmq/observer.h`:
- Around line 35-52: Run clang-format on the ObserverContext header block to
normalize spacing and line breaks for the constructor and inline overrides;
specifically reformat the ObserverContext(...) constructor declaration and the
inline override methods IsMasternode, IsWatching, and SetQuorumSecretKeyShare so
they follow the project's style (consistent indentation, space before/after
braces and parameters, and each inline method signature/braced body on its own
properly formatted line) to satisfy the diff-format check.
In `@src/llmq/quorumsman.cpp`:
- Around line 365-395: The new helper functions RegisterDataRequest and
ValidateDataResponse have inconsistent wrapping and indentation that fails the
diff-format check; run clang-format (or apply the project's style rules) on this
block and reformat these functions so they match the project's style (fix
indentation, line breaks around the LOCK(cs_data_requests) lines, the auto
[old_pair, inserted] = ... destructuring line, parameter wrapping for
CQuorumManager::ValidateDataResponse, and closing braces) to ensure the
diff-format job passes.
---
Outside diff comments:
In `@src/llmq/quorumsman.h`:
- Around line 132-205: Run clang-format on src/llmq/quorumsman.h to fix the
formatting in the newly added API block: reformat the declarations for
RegisterDataRequest, ValidateDataResponse, IsDataRequestPending,
GetDataRequestStatus, CleanupExpiredDataRequests, CleanupOldQuorumData,
GetCachedMutableQuorum, WriteContributions, QueueQuorumForWarming,
BuildQuorumFromCommitment, GetQuorum, CacheWarmingThreadMain,
MigrateOldQuorumDB, the SIGN_HEIGHT_OFFSET line, SelectQuorumForSigning,
VerifyRecoveredSig, and the QuorumRole class so they follow the project's style
(line wrapping, enum class/DataResponseValidation formatting, spacing and brace
placement); simply run the repository's clang-format configuration (or your
editor's clang-format) and commit the resulting whitespace-only changes.
---
Duplicate comments:
In `@src/llmq/net_quorum.cpp`:
- Around line 321-329: NetQuorum::InitializeCurrentBlockTip currently calls
UpdatedBlockTip(tip, nullptr, ibd) and then unconditionally repeats a
quorum-connection scan with a loop over Params().GetConsensus().llmqs calling
CheckQuorumConnections(params, tip); remove that duplicate pass: delete the
for-loop (or at minimum guard it so it does not run during IBD and does not run
when UpdatedBlockTip already performed the scan) so quorum connections are not
repeated or opened during initial block download; keep UpdatedBlockTip, tip and
ibd handling intact.
- Around line 145-160: The combined misbehaviour check incorrectly re-enables
scoring for ENCRYPTED_CONTRIBUTIONS_MISSING because misbehave_contrib (from
ProcessContribQGETDATA) can be true even when sendQDATA suppressed scoring;
update the limit-exceeded branch to ignore misbehave_contrib when the effective
error is CQuorumDataRequest::Errors::ENCRYPTED_CONTRIBUTIONS_MISSING (or when
request.GetError()/ret_err equals that value) so that (misbehave_contrib ||
misbehave_qdata) only triggers PeerMisbehaving if the error is not
ENCRYPTED_CONTRIBUTIONS_MISSING; adjust the same logic in the other occurrence
(around the block that includes ProcessContribQGETDATA, sendQDATA,
request_limit_exceeded and m_peer_manager->PeerMisbehaving) to ensure
ENCRYPTED_CONTRIBUTIONS_MISSING never causes PeerMisbehaving.
In `@src/llmq/quorumsman.h`:
- Around line 53-63: CQuorumManager.h currently uses the llmq::QuorumRole
pointer (m_handler) and accepts a QuorumRole in ConnectManagers() before the
QuorumRole class is declared, creating an include-order dependency; fix this by
adding a forward declaration for namespace llmq { class QuorumRole; } near the
top of quorumsman.h (before the CQuorumManager declaration) so CQuorumManager,
its member m_handler and its ConnectManagers() declaration compile without
requiring the full QuorumRole definition.
---
Nitpick comments:
In `@src/llmq/net_quorum.h`:
- Around line 45-46: Update the ownership comment for NetQuorum to note that
although it is owned exclusively by PeerManagerImpl via AddExtraHandler, it is
also exposed via RegisterValidationInterface by
PeerManagerImpl::AddExtraHandler, creating an additional non-owning reference
path until StopHandlers() calls unregister; mention this to clarify the lifetime
guarantees around m_role and m_nodeman and that they remain valid until the
validation interface is unregistered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3efa6f52-0a22-4cae-9543-3862cd739f6a
📒 Files selected for processing (29)
src/Makefile.amsrc/active/context.cppsrc/active/context.hsrc/active/dkgsession.cppsrc/active/dkgsessionhandler.cppsrc/active/quorums.cppsrc/active/quorums.hsrc/chainlock/handler.cppsrc/dsnotificationinterface.cppsrc/dsnotificationinterface.hsrc/init.cppsrc/llmq/dkgsession.cppsrc/llmq/dkgsession.hsrc/llmq/net_quorum.cppsrc/llmq/net_quorum.hsrc/llmq/observer.cppsrc/llmq/observer.hsrc/llmq/observer/quorums.cppsrc/llmq/observer/quorums.hsrc/llmq/quorums.hsrc/llmq/quorumsman.cppsrc/llmq/quorumsman.hsrc/net_processing.cppsrc/node/context.cppsrc/rpc/quorums.cppsrc/validationinterface.cppsrc/validationinterface.htest/lint/lint-circular-dependencies.pytest/util/data/non-backported.txt
💤 Files with no reviewable changes (5)
- test/util/data/non-backported.txt
- src/llmq/observer/quorums.h
- src/llmq/observer/quorums.cpp
- src/active/quorums.cpp
- src/active/quorums.h
✅ Files skipped from review due to trivial changes (3)
- src/chainlock/handler.cpp
- src/node/context.cpp
- test/lint/lint-circular-dependencies.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/rpc/quorums.cpp
- src/llmq/quorums.h
src/init.cpp
Outdated
| { | ||
| const CBlockIndex* tip = WITH_LOCK(::cs_main, return chainman.ActiveTip()); | ||
| const bool ibd = chainman.ActiveChainstate().IsInitialBlockDownload(); | ||
| if (node.observer_ctx && !node.active_ctx) { | ||
| node.observer_ctx->InitializeCurrentBlockTip(tip, ibd); | ||
| } | ||
| // Note: active_ctx initialization is deferred until after nodeman->Init() | ||
| // so that GetProTxHash() is available for quorum connection setup. | ||
| GetMainSignals().InitializeCurrentBlockTip(tip, ibd); |
There was a problem hiding this comment.
Drain the validation queue before this startup replay.
ThreadImport() can still leave earlier BlockConnected/UpdatedBlockTip work queued here. Broadcasting InitializeCurrentBlockTip() immediately lets subscribers observe the final tip before those older callbacks run.
Suggested fix
{
+ SyncWithValidationInterfaceQueue();
const CBlockIndex* tip = WITH_LOCK(::cs_main, return chainman.ActiveTip());
const bool ibd = chainman.ActiveChainstate().IsInitialBlockDownload();
GetMainSignals().InitializeCurrentBlockTip(tip, ibd);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/init.cpp` around lines 2427 - 2430, Drain the validation callback queue
before broadcasting the current tip: after obtaining tip and ibd (the lines
using WITH_LOCK(::cs_main) and
chainman.ActiveChainstate().IsInitialBlockDownload()), call
SyncWithValidationInterfaceQueue() to ensure any BlockConnected/UpdatedBlockTip
work left by ThreadImport() has been processed, then call
GetMainSignals().InitializeCurrentBlockTip(tip, ibd); this guarantees
subscribers see the final tip after older callbacks have run.
There was a problem hiding this comment.
it's okay if tip is changed after; but it must be updated after nodeman is init.
There was a problem hiding this comment.
@knst, understood — the design is intentional. The concern about queued BlockConnected/UpdatedBlockTip callbacks from ThreadImport() racing with the InitializeCurrentBlockTip() broadcast is not an issue since subscribers are designed to handle tip updates that arrive later. The only hard ordering constraint is that the call must occur after nodeman->Init() (for the active_ctx path) so that GetProTxHash() returns a valid value when quorum connections are set up. Withdrawing the suggestion.
✏️ Learnings added
Learnt from: knst
Repo: dashpay/dash PR: 7247
File: src/init.cpp:2432-2435
Timestamp: 2026-04-01T13:12:12.362Z
Learning: In `src/init.cpp` (PR `#7247`), the two `GetMainSignals().InitializeCurrentBlockTip(tip, ibd)` calls inside the `loadblk` thread are intentionally placed as follows:
1. A first call fires before `node.active_ctx->nodeman->Init()` — covers the general/observer case.
2. A second call fires immediately after `node.active_ctx->nodeman->Init()` inside `if (node.active_ctx)` — this is required so `GetProTxHash()` (set by `nodeman->Init()`) is valid when quorum connection setup runs.
It is acceptable that the tip may change after these calls; no `SyncWithValidationInterfaceQueue()` fence is required before them. The critical invariant is ordering relative to `nodeman->Init()`, not relative to the validation-interface queue.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: knst
Repo: dashpay/dash PR: 7247
File: src/llmq/quorumsman.cpp:0-0
Timestamp: 2026-04-01T12:54:47.684Z
Learning: In `src/llmq/quorumsman.cpp`, the `QuorumRole::DataRecoveryThread` loop intentionally increments `nTries` before calling `m_qman.IsDataRequestPending()` and uses `continue` without a sleep when the request is already pending. This is by design: the loop cycles through `vecMemberHashes` to find a member without an outstanding request, not a bug. This is moved code from the previous `QuorumParticipant`/`QuorumObserver` implementation (PR `#7247`).
Learnt from: knst
Repo: dashpay/dash PR: 7247
File: src/llmq/observer.cpp:0-0
Timestamp: 2026-04-01T13:08:07.207Z
Learning: In `src/llmq/net_quorum.cpp`, the QGETDATA handler has an explicit switch statement that suppresses misbehavior scoring for `CQuorumDataRequest::Errors::ENCRYPTED_CONTRIBUTIONS_MISSING` and `QUORUM_VERIFICATION_VECTOR_MISSING`. Even if `ObserverContext::ProcessContribQGETDATA` in `src/llmq/observer.cpp` returns a `MisbehavingError`, the caller in `net_quorum.cpp` determines misbehavior by inspecting the error code set on the request object, not the return value of the processing method. The switch explicitly does NOT set `misbehave = request_limit_exceeded` for these two error cases.
There was a problem hiding this comment.
I don't see a current-SHA correctness issue here. The startup replay ordering is intentionally handled around nodeman->Init(), and I did not find evidence that an added SyncWithValidationInterfaceQueue() is required for correctness in this refactor.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
No further action needed here. This follow-up already withdraws the queue-drain suggestion and reflects the intended startup ordering on the current SHA.
| if (msg_type == NetMsgType::QGETDATA) { | ||
| if (!m_role || !m_role->IsMasternode() || (pfrom.GetVerifiedProRegTxHash().IsNull() && !pfrom.qwatch)) { | ||
| m_peer_manager->PeerMisbehaving(pfrom.GetId(), 10, "not a verified masternode or a qwatch connection"); | ||
| return; |
There was a problem hiding this comment.
Allow watch-only nodes to answer QGETDATA.
In watch mode init.cpp wires an ObserverContext into NetQuorum, and that role is not a masternode. This guard makes every watcher score inbound quorum-data requests instead of serving verification vectors; the m_nodeman == nullptr case is already handled lower down.
Suggested fix
- if (!m_role || !m_role->IsMasternode() || (pfrom.GetVerifiedProRegTxHash().IsNull() && !pfrom.qwatch)) {
+ const bool can_serve_quorum_data = m_role && (m_role->IsMasternode() || m_role->IsWatching());
+ if (!can_serve_quorum_data || (pfrom.GetVerifiedProRegTxHash().IsNull() && !pfrom.qwatch)) {
m_peer_manager->PeerMisbehaving(pfrom.GetId(), 10, "not a verified masternode or a qwatch connection");
return;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/llmq/net_quorum.cpp` around lines 72 - 75, The QGETDATA guard currently
rejects when m_role is null (observer/watch mode) causing qwatch connections to
be mis-scored; change the conditional logic in the NetMsgType::QGETDATA handling
so that qwatch peers are allowed to proceed regardless of m_role (i.e. only
enforce the m_role/IsMasternode/GetVerifiedProRegTxHash checks for non-qwatch
connections). Update the condition around m_peer_manager->PeerMisbehaving call
(referencing NetMsgType::QGETDATA, m_role, IsMasternode(),
pfrom.GetVerifiedProRegTxHash(), and pfrom.qwatch) so watch-only observers are
not penalized; leave the existing m_nodeman nullptr handling further down
unchanged.
There was a problem hiding this comment.
I couldn't validate this as a regression in the current SHA. The old CQuorumManager::ProcessMessage() path also gated QGETDATA on IsMasternode(), so observer/watch nodes were not previously serving quorum data here; this refactor preserves that behavior rather than newly breaking it.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check git log to find the removed CQuorumManager::ProcessMessage QGETDATA guard
git log --oneline -10 HEAD
git diff HEAD~1 HEAD -- src/llmq/quorumsman.cpp | grep -A 20 "QGETDATA" | head -60
Length of output: 942
🏁 Script executed:
#!/bin/bash
# Find the commit that moved ProcessMessage from CQuorumManager to NetQuorum
git show 1be8863c7 -- src/llmq/quorumsman.cpp | grep -A 30 "QGETDATA" | head -80
Length of output: 4278
@thepastaclaw You're correct — I've now confirmed from the git history that the old CQuorumManager::ProcessMessage() used the same !IsMasternode() guard for QGETDATA:
if (!IsMasternode() || (pfrom.GetVerifiedProRegTxHash().IsNull() && !pfrom.qwatch)) {
return MisbehavingError{10, "not a verified masternode or a qwatch connection"};
}The new net_quorum.cpp code faithfully preserves this behavior. Notably, the old code did allow IsWatching() for the QDATA path, but QGETDATA (serving quorum data) was always masternode-only. My original comment was wrong — this is not a regression. Apologies for the noise!
🧠 Learnings used
Learnt from: UdjinM6
Repo: dashpay/dash PR: 7265
File: src/wallet/wallet.cpp:3381-3382
Timestamp: 2026-04-03T14:17:17.656Z
Learning: In Dash Core, `strprintf()` uses `tinyformat` (type-safe printf). Do not flag format-specifier mismatches for integer types when using `strprintf()`: `%d` is correct for any integer type, including `CAmount`/`int64_t` arguments. Only treat `%d`/integer combinations as a formatting issue if they occur in a non-`strprintf()` (e.g., libc `printf`) context or there is evidence the format string is not being handled by `tinyformat`.
|
This pull request has conflicts, please rebase. |
… no more child of 'observer'
… entity to reduce amount of abstractions
…on' [-Werror,-Wshadow-field]
… and InitializeQuorumConnections
…ontribQDATA to NetQuorum
8311e2e to
e2df648
Compare
|
✅ Review complete (commit 21fe75f) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e2df648675
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| void NetQuorum::Stop() | ||
| { | ||
| workerPool.clear_queue(); | ||
| workerPool.stop(true); |
There was a problem hiding this comment.
Interrupt quorum workers before stopping NetQuorum
PrepareShutdown() invokes PeerManager::StopHandlers() directly, and this Stop() implementation no longer signals quorumThreadInterrupt before joining the pool. Recovery jobs in DataRecoveryThread wait/sleep on that interrupt, so workerPool.stop(true) can block for a long time (or indefinitely while waiting for sync conditions), which turns shutdown into a hang. The pre-refactor quorum observer stop path interrupted first, so this is a regression in teardown behavior.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
called here:
void PeerManagerImpl::InterruptHandlers()
{
for (auto& handler : m_handlers) {
handler->Interrupt();
}
}
| const CBlockIndex* tip = WITH_LOCK(::cs_main, return chainman.ActiveTip()); | ||
| const bool ibd = chainman.ActiveChainstate().IsInitialBlockDownload(); | ||
| node.active_ctx->InitializeCurrentBlockTip(tip, ibd); | ||
| GetMainSignals().InitializeCurrentBlockTip(tip, ibd); |
There was a problem hiding this comment.
Reinitialize active/observer tip state with UpdatedBlockTip
This now broadcasts only InitializeCurrentBlockTip, but ActiveContext/ObserverContext no longer implement that hook after the refactor, so their startup UpdatedBlockTip initialization path is skipped. As a result, quorum/DKG-related state in those contexts is not seeded on restart and remains stale until a later real tip update arrives, which can leave post-boot behavior inconsistent for a synced node.
Useful? React with 👍 / 👎.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Largely mechanical refactor that successfully flattens the quorum class hierarchy. Two issues surfaced: NetQuorum::Stop() no longer signals the thread interrupt before joining worker threads, which can stall shutdown when recovery/cleanup tasks are sleeping; and the startup path now broadcasts InitializeCurrentBlockTip twice in masternode mode, replaying validation callbacks for all subscribers rather than just the LLMQ component.
Reviewed commit: e2df648
🔴 1 blocking | 🟡 1 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/llmq/net_quorum.cpp`:
- [BLOCKING] lines 64-68: NetQuorum::Stop() doesn't signal thread interrupt before joining workers
`Stop()` calls `workerPool.clear_queue()` then `workerPool.stop(true)` (join), but never signals `quorumThreadInterrupt`. Worker tasks like `DataRecoveryThread()` and `StartCleanupOldQuorumDataThread()` sleep on `quorumThreadInterrupt` (lines 500-501, 547, 579). If any of those tasks are active when `Stop()` is called, the `stop(true)` join blocks indefinitely because the sleeping thread never wakes up.
The `Interrupt()` method at line 62 does signal `quorumThreadInterrupt()`, but the shutdown path through `PrepareShutdown() → PeerManagerImpl::StopHandlers()` calls `Stop()` without necessarily calling `Interrupt()` first. The pre-refactor code had `QuorumObserver::Stop()` trigger the interrupt before joining.
Fix: call `quorumThreadInterrupt()` at the start of `Stop()`.
In `src/init.cpp`:
- [SUGGESTION] line 2512: InitializeCurrentBlockTip broadcast twice in masternode mode — replays callbacks for all subscribers
`GetMainSignals().InitializeCurrentBlockTip(tip, ibd)` is now called twice in masternode mode: once after `ThreadImport()` (line ~2435) and again after `nodeman->Init()` (line 2512). The second call re-invokes every registered subscriber's `InitializeCurrentBlockTip()`, not just the LLMQ component that needs the `proTxHash`. `CDSNotificationInterface::InitializeCurrentBlockTip()` replays both `SynchronousUpdatedBlockTip()` and `UpdatedBlockTip()` on the same tip, so masternode startup now processes those callbacks twice.
The pre-refactor code targeted only the quorum-specific component after `nodeman->Init()`. Consider either re-initializing only the LLMQ/NetQuorum subscriber, or adding an idempotence guard so non-LLMQ listeners don't reprocess the same tip.
| void NetQuorum::Stop() | ||
| { | ||
| workerPool.clear_queue(); | ||
| workerPool.stop(true); | ||
| } |
There was a problem hiding this comment.
🔴 Blocking: NetQuorum::Stop() doesn't signal thread interrupt before joining workers
Stop() calls workerPool.clear_queue() then workerPool.stop(true) (join), but never signals quorumThreadInterrupt. Worker tasks like DataRecoveryThread() and StartCleanupOldQuorumDataThread() sleep on quorumThreadInterrupt (lines 500-501, 547, 579). If any of those tasks are active when Stop() is called, the stop(true) join blocks indefinitely because the sleeping thread never wakes up.
The Interrupt() method at line 62 does signal quorumThreadInterrupt(), but the shutdown path through PrepareShutdown() → PeerManagerImpl::StopHandlers() calls Stop() without necessarily calling Interrupt() first. The pre-refactor code had QuorumObserver::Stop() trigger the interrupt before joining.
Fix: call quorumThreadInterrupt() at the start of Stop().
💡 Suggested change
| void NetQuorum::Stop() | |
| { | |
| workerPool.clear_queue(); | |
| workerPool.stop(true); | |
| } | |
| void NetQuorum::Stop() | |
| { | |
| quorumThreadInterrupt(); | |
| workerPool.clear_queue(); | |
| workerPool.stop(true); | |
| } |
source: ['codex']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/llmq/net_quorum.cpp`:
- [BLOCKING] lines 64-68: NetQuorum::Stop() doesn't signal thread interrupt before joining workers
`Stop()` calls `workerPool.clear_queue()` then `workerPool.stop(true)` (join), but never signals `quorumThreadInterrupt`. Worker tasks like `DataRecoveryThread()` and `StartCleanupOldQuorumDataThread()` sleep on `quorumThreadInterrupt` (lines 500-501, 547, 579). If any of those tasks are active when `Stop()` is called, the `stop(true)` join blocks indefinitely because the sleeping thread never wakes up.
The `Interrupt()` method at line 62 does signal `quorumThreadInterrupt()`, but the shutdown path through `PrepareShutdown() → PeerManagerImpl::StopHandlers()` calls `Stop()` without necessarily calling `Interrupt()` first. The pre-refactor code had `QuorumObserver::Stop()` trigger the interrupt before joining.
Fix: call `quorumThreadInterrupt()` at the start of `Stop()`.
There was a problem hiding this comment.
called here:
void PeerManagerImpl::InterruptHandlers()
{
for (auto& handler : m_handlers) {
handler->Interrupt();
}
}
| const CBlockIndex* tip = WITH_LOCK(::cs_main, return chainman.ActiveTip()); | ||
| const bool ibd = chainman.ActiveChainstate().IsInitialBlockDownload(); | ||
| node.active_ctx->InitializeCurrentBlockTip(tip, ibd); | ||
| GetMainSignals().InitializeCurrentBlockTip(tip, ibd); |
There was a problem hiding this comment.
🟡 Suggestion: InitializeCurrentBlockTip broadcast twice in masternode mode — replays callbacks for all subscribers
GetMainSignals().InitializeCurrentBlockTip(tip, ibd) is now called twice in masternode mode: once after ThreadImport() (line ~2435) and again after nodeman->Init() (line 2512). The second call re-invokes every registered subscriber's InitializeCurrentBlockTip(), not just the LLMQ component that needs the proTxHash. CDSNotificationInterface::InitializeCurrentBlockTip() replays both SynchronousUpdatedBlockTip() and UpdatedBlockTip() on the same tip, so masternode startup now processes those callbacks twice.
The pre-refactor code targeted only the quorum-specific component after nodeman->Init(). Consider either re-initializing only the LLMQ/NetQuorum subscriber, or adding an idempotence guard so non-LLMQ listeners don't reprocess the same tip.
source: ['codex']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/init.cpp`:
- [SUGGESTION] line 2512: InitializeCurrentBlockTip broadcast twice in masternode mode — replays callbacks for all subscribers
`GetMainSignals().InitializeCurrentBlockTip(tip, ibd)` is now called twice in masternode mode: once after `ThreadImport()` (line ~2435) and again after `nodeman->Init()` (line 2512). The second call re-invokes every registered subscriber's `InitializeCurrentBlockTip()`, not just the LLMQ component that needs the `proTxHash`. `CDSNotificationInterface::InitializeCurrentBlockTip()` replays both `SynchronousUpdatedBlockTip()` and `UpdatedBlockTip()` on the same tip, so masternode startup now processes those callbacks twice.
The pre-refactor code targeted only the quorum-specific component after `nodeman->Init()`. Consider either re-initializing only the LLMQ/NetQuorum subscriber, or adding an idempotence guard so non-LLMQ listeners don't reprocess the same tip.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f6581ca0e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| llmq::QuorumRole* quorum_role = node.active_ctx ? static_cast<llmq::QuorumRole*>(node.active_ctx.get()) | ||
| : static_cast<llmq::QuorumRole*>(node.observer_ctx.get()); | ||
| auto net_quorum = std::make_unique<llmq::NetQuorum>( | ||
| node.peerman.get(), *node.llmq_ctx->bls_worker, *node.connman, *node.dmnman, |
There was a problem hiding this comment.
Skip NetQuorum setup when no quorum role is active
When the node is neither an active masternode nor started with -watchquorums, quorum_role is null here but NetQuorum is still added as a handler. PeerManagerImpl::StartHandlers() then always calls NetQuorum::Start(), which unconditionally creates a q-mngr thread pool (workerPool.resize(m_worker_count), up to 4 threads by default). Before this refactor, quorum worker threads were only started in active/watch modes, so this introduces avoidable thread/memory overhead for normal nodes.
Useful? React with 👍 / 👎.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Thanks for addressing the prior review comments. The forward declaration and ~QuorumRole() = default changes are correct, and knst's clarifications about the shutdown ordering (Interrupt before Stop), the intentional recovery-loop behavior, and the ENCRYPTED_CONTRIBUTIONS_MISSING switch handling all check out.
However, the latest commit (9f6581ca) introduces a build break and CI is red across all platforms.
Reviewed commit: 9f6581c
🔴 1 blocking
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/llmq/net_quorum.cpp`:
- [BLOCKING] lines 587, 620: `pQuorum->validMembers` does not exist — should be `pQuorum->qc->validMembers`
The latest commit added `if (pQuorum->validMembers.empty()) return;` in `StartVvecSyncThread` (line 587) and `StartSkShareRecoveryThread` (line 620). `CQuorum` does not have a `validMembers` member; `validMembers` lives in `CFinalCommitment` accessed via `pQuorum->qc->validMembers`. CI is failing on all platforms with: `error: no member named 'validMembers' in 'llmq::CQuorum'`.
Fix: `pQuorum->validMembers` → `pQuorum->qc->validMembers` in both locations.
9f6581c to
8bc5744
Compare
… and no empty qc member
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Verified 10 agent findings against source code on PR #7247 (SHA 55d8025). The convergent InitializeCurrentBlockTip finding is partially valid but lower severity than claimed — NetQuorum::InitializeCurrentBlockTip (net_quorum.cpp:322) properly handles quorum connection initialization at startup; the gap is only that ActiveContext/ObserverContext sub-components miss a tip-based initialization call on idle chains, which self-heals on the next block. The TransactionRemovedFromMempool finding is invalid — no behavioral change. The NetQuorum::Stop deadlock finding is invalid — no circular lock dependency exists. Several minor/cosmetic items confirmed.
Reviewed commit: 55d8025
🟡 3 suggestion(s) | 💬 1 nitpick(s)
3 additional findings
🟡 suggestion: ActiveContext/ObserverContext sub-components miss startup initialization on idle chains
src/active/context.cpp (line 1)
ActiveContext and ObserverContext don't override InitializeCurrentBlockTip (validationinterface.h:121 provides no-op default). NetQuorum::InitializeCurrentBlockTip (net_quorum.cpp:322-331) correctly handles quorum connections and data recovery at startup. However, ActiveContext::UpdatedBlockTip (context.cpp:98-107) calls nodeman->UpdatedBlockTip, ehf_sighandler->UpdatedBlockTip, gov_signer->UpdatedBlockTip, and qdkgsman->UpdatedBlockTip — none of these receive a startup kick on idle chains. These components will catch up when the next block arrives, so this is a minor gap rather than a functional break. The original fix commit (5e79dbe) added InitializeCurrentBlockTip to ActiveContext specifically for quorum connections — that concern is now handled by NetQuorum. Consider whether qdkgsman/ehf_sighandler need an explicit startup call or if next-block catchup is sufficient.
🟡 suggestion: InitializeCurrentBlockTip called twice — intentional but worth a comment clarification
src/init.cpp (line 1)
GetMainSignals().InitializeCurrentBlockTip() is called at init.cpp:2435 (after ThreadImport) and again at init.cpp:2512 (after nodeman->Init). The second call is guarded by if (node.active_ctx) and documented in comments (init.cpp:2506-2509). This is intentional: the first call runs before the masternode's proTxHash is set by nodeman->Init, so quorum connections skip masternodes with null proTxHash. The second call reinitializes with the correct proTxHash.
🟡 suggestion: Consider asserting m_quorum_base_block_index in CQuorum constructor
src/llmq/quorums.cpp (line 1)
CQuorum constructor (quorums.cpp:103-114) asserts qc != nullptr (line 113, added in HEAD commit) but doesn't assert m_quorum_base_block_index. The member is initialized from the constructor parameter _pQuorumBaseBlockIndex (line 108) which could be nullptr. All current callers appear to provide valid indices, and quorums.h:175 has a default of nullptr for the member. An assertion would be a safety belt consistent with the existing qc assertion.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/active/context.cpp`:
- [SUGGESTION] line 1: ActiveContext/ObserverContext sub-components miss startup initialization on idle chains
ActiveContext and ObserverContext don't override InitializeCurrentBlockTip (validationinterface.h:121 provides no-op default). NetQuorum::InitializeCurrentBlockTip (net_quorum.cpp:322-331) correctly handles quorum connections and data recovery at startup. However, ActiveContext::UpdatedBlockTip (context.cpp:98-107) calls nodeman->UpdatedBlockTip, ehf_sighandler->UpdatedBlockTip, gov_signer->UpdatedBlockTip, and qdkgsman->UpdatedBlockTip — none of these receive a startup kick on idle chains. These components will catch up when the next block arrives, so this is a minor gap rather than a functional break. The original fix commit (5e79dbe33e) added InitializeCurrentBlockTip to ActiveContext specifically for quorum connections — that concern is now handled by NetQuorum. Consider whether qdkgsman/ehf_sighandler need an explicit startup call or if next-block catchup is sufficient.
In `src/init.cpp`:
- [SUGGESTION] line 1: InitializeCurrentBlockTip called twice — intentional but worth a comment clarification
GetMainSignals().InitializeCurrentBlockTip() is called at init.cpp:2435 (after ThreadImport) and again at init.cpp:2512 (after nodeman->Init). The second call is guarded by `if (node.active_ctx)` and documented in comments (init.cpp:2506-2509). This is intentional: the first call runs before the masternode's proTxHash is set by nodeman->Init, so quorum connections skip masternodes with null proTxHash. The second call reinitializes with the correct proTxHash.
In `src/llmq/quorums.cpp`:
- [SUGGESTION] line 1: Consider asserting m_quorum_base_block_index in CQuorum constructor
CQuorum constructor (quorums.cpp:103-114) asserts `qc != nullptr` (line 113, added in HEAD commit) but doesn't assert m_quorum_base_block_index. The member is initialized from the constructor parameter _pQuorumBaseBlockIndex (line 108) which could be nullptr. All current callers appear to provide valid indices, and quorums.h:175 has a default of nullptr for the member. An assertion would be a safety belt consistent with the existing qc assertion.
| @@ -0,0 +1,712 @@ | |||
| // Copyright (c) 2025-2026 The Dash Core developers | |||
There was a problem hiding this comment.
💬 Nitpick: Missing space between && and std::ranges
At net_quorum.cpp:456: is_masternode &&std::ranges::any_of(...) — missing space after &&. Should be is_masternode && std::ranges::any_of(...).
source: ['claude']
If I understand correctly, with fixed I applied patch from your fix. |
On masternodes the first InitializeCurrentBlockTip (before nodeman->Init) now reaches NetQuorum, which triggers ScanQuorums and BuildQuorumContributions while GetProTxHash() is still null. This causes SetQuorumSecretKeyShare to fail, caching the quorum without skShare. The second call after nodeman->Init hits the cache and never rebuilds, leaving the node reliant on network recovery for its own quorum keys. Fix by only calling g_ds_notification_interface directly in the first pass on masternodes, deferring the full broadcast to after nodeman->Init. Non-masternode nodes broadcast normally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ety assert In TriggerQuorumDataRecoveryThreads, avoid calling GetProTxHash() in observer mode where it returns a null hash. Also adds missing const qualifier and fixes operator spacing. Assert m_quorum_base_block_index in CQuorum constructor, consistent with the existing qc != nullptr assertion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
e9b93e3 to
21fe75f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 21fe75f0bc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const CBlockIndex* tip = WITH_LOCK(::cs_main, return chainman.ActiveTip()); | ||
| const bool ibd = chainman.ActiveChainstate().IsInitialBlockDownload(); | ||
| node.active_ctx->InitializeCurrentBlockTip(tip, ibd); | ||
| GetMainSignals().InitializeCurrentBlockTip(tip, ibd); |
There was a problem hiding this comment.
Initialize quorum contexts via UpdatedBlockTip on startup
Switching startup seeding to GetMainSignals().InitializeCurrentBlockTip(...) here skips ActiveContext/ObserverContext initialization logic, because neither context overrides InitializeCurrentBlockTip after this refactor, while their UpdatedBlockTip handlers still perform qdkgsman state updates. On a restart where no new tip arrives immediately, DKG/quorum state is left stale until a later block update, which can delay or break expected quorum behavior right after boot.
Useful? React with 👍 / 👎.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
I validated the current SHA and found one substantive regression in the startup replay path: the refactor replaced the quorum contexts' explicit InitializeCurrentBlockTip() hooks with broad GetMainSignals().InitializeCurrentBlockTip(...) broadcasts, but ActiveContext/ObserverContext no longer override that hook, so their DKG/session state is not seeded on restart until another block arrives. I did not confirm any of the live CodeRabbit inline findings against the current code; the remaining bot comments are either already fixed on this SHA, refer to moved code, or were already retracted in-thread.
I validated existing CodeRabbit comments separately in-thread instead of restating them here.
Reviewed commit: 21fe75f
🔴 1 blocking
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/init.cpp`:
- [BLOCKING] lines 2443-2445: Startup replay no longer seeds the quorum contexts with the current tip
Before this refactor, both `ActiveContext` and `ObserverContext` had explicit `InitializeCurrentBlockTip(tip, ibd)` methods that replayed `UpdatedBlockTip(tip, nullptr, ibd)` on startup and then initialized quorum connections. In the current SHA, init now broadcasts `GetMainSignals().InitializeCurrentBlockTip(tip, ibd)` instead (`src/init.cpp:2443-2445`, repeated after `nodeman->Init()` at `2515-2523`), but the quorum contexts no longer override `InitializeCurrentBlockTip()` and only implement `UpdatedBlockTip()`. As a result, the startup replay never reaches `qdkgsman->UpdatedBlockTip(...)`, so DKG/session state stays at defaults until another block arrives. That leaves restarted masternodes and observer/watch nodes out of sync with any in-progress DKG round immediately after startup.
| // Non-masternode nodes (including observer-only): broadcast | ||
| // to all subscribers now; no proTxHash dependency. | ||
| GetMainSignals().InitializeCurrentBlockTip(tip, ibd); |
There was a problem hiding this comment.
🔴 Blocking: Startup replay no longer seeds the quorum contexts with the current tip
Before this refactor, both ActiveContext and ObserverContext had explicit InitializeCurrentBlockTip(tip, ibd) methods that replayed UpdatedBlockTip(tip, nullptr, ibd) on startup and then initialized quorum connections. In the current SHA, init now broadcasts GetMainSignals().InitializeCurrentBlockTip(tip, ibd) instead (src/init.cpp:2443-2445, repeated after nodeman->Init() at 2515-2523), but the quorum contexts no longer override InitializeCurrentBlockTip() and only implement UpdatedBlockTip(). As a result, the startup replay never reaches qdkgsman->UpdatedBlockTip(...), so DKG/session state stays at defaults until another block arrives. That leaves restarted masternodes and observer/watch nodes out of sync with any in-progress DKG round immediately after startup.
💡 Suggested change
| // Non-masternode nodes (including observer-only): broadcast | |
| // to all subscribers now; no proTxHash dependency. | |
| GetMainSignals().InitializeCurrentBlockTip(tip, ibd); | |
| Reintroduce `InitializeCurrentBlockTip()` on the quorum contexts (or call an equivalent explicit startup hook) so startup still replays `UpdatedBlockTip()` into `qdkgsman` before relying on later tip events. |
source: ['codex-general', 'manual-general']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/init.cpp`:
- [BLOCKING] lines 2443-2445: Startup replay no longer seeds the quorum contexts with the current tip
Before this refactor, both `ActiveContext` and `ObserverContext` had explicit `InitializeCurrentBlockTip(tip, ibd)` methods that replayed `UpdatedBlockTip(tip, nullptr, ibd)` on startup and then initialized quorum connections. In the current SHA, init now broadcasts `GetMainSignals().InitializeCurrentBlockTip(tip, ibd)` instead (`src/init.cpp:2443-2445`, repeated after `nodeman->Init()` at `2515-2523`), but the quorum contexts no longer override `InitializeCurrentBlockTip()` and only implement `UpdatedBlockTip()`. As a result, the startup replay never reaches `qdkgsman->UpdatedBlockTip(...)`, so DKG/session state stays at defaults until another block arrives. That leaves restarted masternodes and observer/watch nodes out of sync with any in-progress DKG round immediately after startup.
There was a problem hiding this comment.
@thepastaclaw how about UdjinM6@f96405f + UdjinM6@3486ea4?
Issue being fixed or feature implemented
It's a follow-up for #7066 and related PRs
This PR is reducing mental concentration when reading quorum-related code by significant simplification amount of levels of abstractions and inheritance classes from each other.
This PR is a direct dependency of #7234 (kernel / chainstate project)
Performance is not expected to be improved by any noticeable margin.
Compilation time is expected to be improved marginally due to fewer files (-2) to compile and less transitive includes over
src/llmq/quorumsman.h.What was done?
This changes has been spotted during #7234 and extracted to own PR.
Class hierarchy — before vs. after
*by claude
Before:
QuorumObserverParent (interface, implemented by CQuorumManager)
│
QuorumObserver (base class with shared state)
├── ObserverContext (observer mode, also CValidationInterface)
└── QuorumParticipant (active mode, in src/active/quorums.h)
After:
QuorumRole (pure virtual interface in quorumsman.h)
├── ObserverContext (observer mode, in src/llmq/observer.h)
└── ActiveContext (active mode, QuorumParticipant inlined)
How Has This Been Tested?
Run unit & functional tests
Breaking Changes
N/A
Checklist: