Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 21 additions & 22 deletions src/llmq/net_signing.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -306,17 +306,25 @@ void NetSigning::WorkThreadDispatcher()
}
}

if (m_shares_manager->IsAnyPendingProcessing()) {
// If there's processing work, spawn a helper worker
worker_pool.push([this](int) {
Copy link
Copy Markdown
Collaborator

@knst knst Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total amount of worker is limited by 4. If there is work to be done it should be pull out and pushed to the queue fast as possible, even after 10ms.
The problem, if new worker will be created faster, than old worker will pull a queue, a new worker will be spawn to process the same queue but there's nothing in queue and that's a burning cpu-cycles.

This implementation with atomic processing_worker_scheduled looks legit and probably works [I haven't reviewed detailed / haven't tested], but it won't do 2 calls of ProcessPendingSigShares in the parallel. So, several instances of SignAndProcessSingleShare could be spawn, but the only one instance of ProcessPendingSigShares could run on the same time.

I think much better solution will be splitting ProcessPendingSigShares to 2 steps: CollectPendingSigSharesToVerify [which is called in a sync way, without background worker, it should be fast enough as it doesn't do any bls validation inside] and after that the actual call of ProcessPendingSigShares over the ready queue of prepared sig-shares to process in async (using worker_pool). In this case could be created more than 1 worker to ProcessPendingSigShares if there are many of them (more than 32 accordingly nMaxBatchSize)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — you're right that the atomic guard serializes ProcessPendingSigShares to a single concurrent instance.

The current fix was scoped narrowly: prevent the unbounded queue growth from redundant task spawns (the old workers would find nothing and immediately exit, so processing was effectively serial already — just with wasted allocations/queue churn).

Your suggestion to hoist CollectPendingSigSharesToVerify into the dispatcher and fan out verification workers per batch is a better architecture for actual parallelism. The collection step is cheap (no BLS ops), and the verification is where the real CPU time goes — batching those across workers would scale much better under load.

Happy to refactor this PR to that approach, or land this as a conservative fix and follow up with the parallel design. What's your preference?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on the parallelism limitation — you're right that the current approach serialises all ProcessPendingSigShares calls.

To be transparent: the atomic flag approach was the minimal fix for the immediate problem (unbounded queue growth from firing a task every 10ms regardless of whether a worker was already running). That bug was real and concrete, so the goal here was the smallest correct fix.

Your suggestion — call CollectPendingSigSharesToVerify synchronously in the dispatcher, then push the collected batch data into a worker for BLS verification — is architecturally cleaner and would allow multiple BLS verification batches to run in parallel. The collect step holds cs only briefly and does no BLS work, so it's safe to call synchronously.

I think this is the right direction but it's a larger refactor: the current ProcessPendingSigShares in net_signing.cpp owns the entire collect→verify→ProcessPendingSigShares pipeline, so pulling collect out requires restructuring what data flows between the dispatcher thread and the workers.

A few questions before I take a stab at it:

  1. Should the collect step move entirely into the dispatcher loop (replacing IsAnyPendingProcessing + the current guard), or should it be a separate method called from the dispatcher?
  2. For the collected batch, should the worker call the existing ProcessPendingSigShares(sigSharesByNodes, quorums) overload directly, or does that need to change too?

Happy to implement if you think the change is worth doing now vs. a follow-up.

Copy link
Copy Markdown
Collaborator

@knst knst Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queue growth from firing a task every 10ms regardless

it's limited by 4 workers at once, isn't it? so it won't "growth" in reallity due to very small threshold, will it?

A few questions before I take a stab at it:

Just pull out this code out of NetSigning::ProcessPendingSigShares inside WorkThreadDispatcher, call it outside worker spawning and when creating worker for ProcessPendingSigShares pass sigSharesByNodes and quorums as an argument.

    std::unordered_map<NodeId, std::vector<CSigShare>> sigSharesByNodes;
    std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher> quorums;

    const size_t nMaxBatchSize{32};
    bool more_work = m_shares_manager->CollectPendingSigSharesToVerify(nMaxBatchSize, sigSharesByNodes, quorums);
    if (sigSharesByNodes.empty()) {
        return false;
    }

As a follow-up, consider further improvement by refactoring CollectPendingSigSharesToVerify: init sigSharesByNodes inside dispatcher, but quorums - inside NetSigning::ProcessPendingSigShares call

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented — force-pushed with the approach you suggested.

The dispatcher now calls CollectPendingSigSharesToVerify synchronously in a loop, moving each batch into its own worker for parallel BLS verification:

while (!workInterrupt) {
    // ... collect synchronously ...
    bool more_work = m_shares_manager->CollectPendingSigSharesToVerify(nMaxBatchSize, sigSharesByNodes, quorums);
    if (sigSharesByNodes.empty()) break;

    worker_pool.push([this, sigSharesByNodes = std::move(sigSharesByNodes),
                      quorums = std::move(quorums)](int) mutable {
        ProcessPendingSigShares(std::move(sigSharesByNodes), std::move(quorums));
    });
    if (!more_work) break;
}

ProcessPendingSigShares now takes sigSharesByNodes and quorums as parameters (by value, moved from the lambda capture). The atomic guard is removed — no longer needed since each Collect call dequeues unique data.

Left the quorums lookup inside ProcessPendingSigShares for now per your suggestion to consider that as a follow-up refactor.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — the pool size (4 workers max) bounds the concurrency, so the queue doesn't grow truly unboundedly. The old behavior was more 'wasteful' than 'dangerous' — redundant tasks that find nothing to process.

The updated implementation addresses both concerns: no wasted allocations AND parallel BLS verification when multiple batches are available.

while (!workInterrupt) {
bool moreWork = ProcessPendingSigShares();

if (!moreWork) {
return; // No work found, exit immediately
}
}
// Collect pending sig shares synchronously and dispatch each batch to a worker for parallel BLS verification
while (!workInterrupt) {
std::unordered_map<NodeId, std::vector<CSigShare>> sigSharesByNodes;
std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher> quorums;

const size_t nMaxBatchSize{32};
bool more_work = m_shares_manager->CollectPendingSigSharesToVerify(nMaxBatchSize, sigSharesByNodes, quorums);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent pre-draining sigshare queue before dedupe state updates

Calling CollectPendingSigSharesToVerify in this tight dispatcher loop and immediately enqueueing each batch means collection can outrun verification, so many batches are removed before workers update dedupe state. Collection dedupes only against sigShares.Has(...) in CSigSharesManager::CollectPendingSigSharesToVerify, but that set is updated later in ProcessSigShare, so under duplicate-share floods this change can enqueue and verify the same key far more times than before, creating avoidable CPU amplification and delayed recovery progress.

Useful? React with 👍 / 👎.


if (sigSharesByNodes.empty()) {
break;
}

worker_pool.push([this, sigSharesByNodes = std::move(sigSharesByNodes), quorums = std::move(quorums)](int) mutable {
ProcessPendingSigShares(std::move(sigSharesByNodes), std::move(quorums));
});

if (!more_work) {
break;
}
}

// Always sleep briefly between checks
Expand All @@ -329,17 +337,10 @@ void NetSigning::NotifyRecoveredSig(const std::shared_ptr<const CRecoveredSig>&
m_peer_manager->PeerRelayRecoveredSig(*sig, proactive_relay);
}

bool NetSigning::ProcessPendingSigShares()
void NetSigning::ProcessPendingSigShares(
std::unordered_map<NodeId, std::vector<CSigShare>>&& sigSharesByNodes,
std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher>&& quorums)
{
std::unordered_map<NodeId, std::vector<CSigShare>> sigSharesByNodes;
std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher> quorums;

const size_t nMaxBatchSize{32};
bool more_work = m_shares_manager->CollectPendingSigSharesToVerify(nMaxBatchSize, sigSharesByNodes, quorums);
if (sigSharesByNodes.empty()) {
return false;
}

// It's ok to perform insecure batched verification here as we verify against the quorum public key shares,
// which are not craftable by individual entities, making the rogue public key attack impossible
CBLSBatchVerifier<NodeId, SigShareKey> batchVerifier(false, true);
Expand Down Expand Up @@ -400,8 +401,6 @@ bool NetSigning::ProcessPendingSigShares()
ProcessRecoveredSig(rs, true);
}
}

return more_work;
}

} // namespace llmq
6 changes: 5 additions & 1 deletion src/llmq/net_signing.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
#define BITCOIN_LLMQ_NET_SIGNING_H

#include <ctpl_stl.h>
#include <llmq/signing_shares.h>
#include <llmq/types.h>
#include <net_processing.h>
#include <util/threadinterrupt.h>
#include <util/time.h>
Expand Down Expand Up @@ -52,7 +54,9 @@ class NetSigning final : public NetHandler, public CValidationInterface
void WorkThreadCleaning();
void WorkThreadDispatcher();

bool ProcessPendingSigShares();
void ProcessPendingSigShares(
std::unordered_map<NodeId, std::vector<CSigShare>>&& sigSharesByNodes,
std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher>&& quorums);

void RemoveBannedNodeStates();
void BanNode(NodeId nodeid);
Expand Down
Loading