-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: collect sigshares synchronously in dispatcher for parallel verification #7225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -306,17 +306,25 @@ void NetSigning::WorkThreadDispatcher() | |
| } | ||
| } | ||
|
|
||
| if (m_shares_manager->IsAnyPendingProcessing()) { | ||
| // If there's processing work, spawn a helper worker | ||
| worker_pool.push([this](int) { | ||
| while (!workInterrupt) { | ||
| bool moreWork = ProcessPendingSigShares(); | ||
|
|
||
| if (!moreWork) { | ||
| return; // No work found, exit immediately | ||
| } | ||
| } | ||
| // Collect pending sig shares synchronously and dispatch each batch to a worker for parallel BLS verification | ||
| while (!workInterrupt) { | ||
| std::unordered_map<NodeId, std::vector<CSigShare>> sigSharesByNodes; | ||
| std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher> quorums; | ||
|
|
||
| const size_t nMaxBatchSize{32}; | ||
| bool more_work = m_shares_manager->CollectPendingSigSharesToVerify(nMaxBatchSize, sigSharesByNodes, quorums); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Calling Useful? React with 👍 / 👎. |
||
|
|
||
| if (sigSharesByNodes.empty()) { | ||
| break; | ||
| } | ||
|
|
||
| worker_pool.push([this, sigSharesByNodes = std::move(sigSharesByNodes), quorums = std::move(quorums)](int) mutable { | ||
| ProcessPendingSigShares(std::move(sigSharesByNodes), std::move(quorums)); | ||
| }); | ||
|
|
||
| if (!more_work) { | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| // Always sleep briefly between checks | ||
|
|
@@ -329,17 +337,10 @@ void NetSigning::NotifyRecoveredSig(const std::shared_ptr<const CRecoveredSig>& | |
| m_peer_manager->PeerRelayRecoveredSig(*sig, proactive_relay); | ||
| } | ||
|
|
||
| bool NetSigning::ProcessPendingSigShares() | ||
| void NetSigning::ProcessPendingSigShares( | ||
| std::unordered_map<NodeId, std::vector<CSigShare>>&& sigSharesByNodes, | ||
| std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher>&& quorums) | ||
| { | ||
| std::unordered_map<NodeId, std::vector<CSigShare>> sigSharesByNodes; | ||
| std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher> quorums; | ||
|
|
||
| const size_t nMaxBatchSize{32}; | ||
| bool more_work = m_shares_manager->CollectPendingSigSharesToVerify(nMaxBatchSize, sigSharesByNodes, quorums); | ||
| if (sigSharesByNodes.empty()) { | ||
| return false; | ||
| } | ||
|
|
||
| // It's ok to perform insecure batched verification here as we verify against the quorum public key shares, | ||
| // which are not craftable by individual entities, making the rogue public key attack impossible | ||
| CBLSBatchVerifier<NodeId, SigShareKey> batchVerifier(false, true); | ||
|
|
@@ -400,8 +401,6 @@ bool NetSigning::ProcessPendingSigShares() | |
| ProcessRecoveredSig(rs, true); | ||
| } | ||
| } | ||
|
|
||
| return more_work; | ||
| } | ||
|
|
||
| } // namespace llmq | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total amount of worker is limited by 4. If there is work to be done it should be pull out and pushed to the queue fast as possible, even after 10ms.
The problem, if new worker will be created faster, than old worker will pull a queue, a new worker will be spawn to process the same queue but there's nothing in queue and that's a burning cpu-cycles.
This implementation with atomic
processing_worker_scheduledlooks legit and probably works [I haven't reviewed detailed / haven't tested], but it won't do 2 calls ofProcessPendingSigSharesin the parallel. So, several instances ofSignAndProcessSingleSharecould be spawn, but the only one instance ofProcessPendingSigSharescould run on the same time.I think much better solution will be splitting
ProcessPendingSigSharesto 2 steps:CollectPendingSigSharesToVerify[which is called in a sync way, without background worker, it should be fast enough as it doesn't do any bls validation inside] and after that the actual call of ProcessPendingSigShares over the ready queue of prepared sig-shares to process in async (using worker_pool). In this case could be created more than 1 worker to ProcessPendingSigShares if there are many of them (more than 32 accordinglynMaxBatchSize)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point — you're right that the atomic guard serializes
ProcessPendingSigSharesto a single concurrent instance.The current fix was scoped narrowly: prevent the unbounded queue growth from redundant task spawns (the old workers would find nothing and immediately exit, so processing was effectively serial already — just with wasted allocations/queue churn).
Your suggestion to hoist
CollectPendingSigSharesToVerifyinto the dispatcher and fan out verification workers per batch is a better architecture for actual parallelism. The collection step is cheap (no BLS ops), and the verification is where the real CPU time goes — batching those across workers would scale much better under load.Happy to refactor this PR to that approach, or land this as a conservative fix and follow up with the parallel design. What's your preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point on the parallelism limitation — you're right that the current approach serialises all
ProcessPendingSigSharescalls.To be transparent: the atomic flag approach was the minimal fix for the immediate problem (unbounded queue growth from firing a task every 10ms regardless of whether a worker was already running). That bug was real and concrete, so the goal here was the smallest correct fix.
Your suggestion — call
CollectPendingSigSharesToVerifysynchronously in the dispatcher, then push the collected batch data into a worker for BLS verification — is architecturally cleaner and would allow multiple BLS verification batches to run in parallel. The collect step holdscsonly briefly and does no BLS work, so it's safe to call synchronously.I think this is the right direction but it's a larger refactor: the current
ProcessPendingSigSharesinnet_signing.cppowns the entire collect→verify→ProcessPendingSigShares pipeline, so pulling collect out requires restructuring what data flows between the dispatcher thread and the workers.A few questions before I take a stab at it:
IsAnyPendingProcessing+ the current guard), or should it be a separate method called from the dispatcher?ProcessPendingSigShares(sigSharesByNodes, quorums)overload directly, or does that need to change too?Happy to implement if you think the change is worth doing now vs. a follow-up.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's limited by 4 workers at once, isn't it? so it won't "growth" in reallity due to very small threshold, will it?
Just pull out this code out of
NetSigning::ProcessPendingSigSharesinsideWorkThreadDispatcher, call it outside worker spawning and when creating worker forProcessPendingSigSharespasssigSharesByNodesandquorumsas an argument.As a follow-up, consider further improvement by refactoring
CollectPendingSigSharesToVerify: initsigSharesByNodesinside dispatcher, butquorums- insideNetSigning::ProcessPendingSigSharescallThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented — force-pushed with the approach you suggested.
The dispatcher now calls
CollectPendingSigSharesToVerifysynchronously in a loop, moving each batch into its own worker for parallel BLS verification:ProcessPendingSigSharesnow takessigSharesByNodesandquorumsas parameters (by value, moved from the lambda capture). The atomic guard is removed — no longer needed since each Collect call dequeues unique data.Left the
quorumslookup insideProcessPendingSigSharesfor now per your suggestion to consider that as a follow-up refactor.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right — the pool size (4 workers max) bounds the concurrency, so the queue doesn't grow truly unboundedly. The old behavior was more 'wasteful' than 'dangerous' — redundant tasks that find nothing to process.
The updated implementation addresses both concerns: no wasted allocations AND parallel BLS verification when multiple batches are available.