-
Notifications
You must be signed in to change notification settings - Fork 2
Reordering Algorithms
GraphBrew implements 17 algorithm IDs (0-16):
- 2 Baselines (IDs 0-1): ORIGINAL and RANDOM — graph states, not reordering techniques. Used to establish reference benchmark times.
- 13 Reordering Algorithms (IDs 2-12, 15-16): Produce vertex reorderings. Benchmarked to measure speedup vs baselines.
- 2 Meta-Algorithms (IDs 13-14): MAP (external label map) and AdaptiveOrder (ML selector). Not included in standard benchmarks.
Note: Algorithm ID 13 (MAP) is reserved for external label mapping files, not a standalone reordering algorithm.
Graph algorithms spend significant time accessing memory. When vertices are ordered randomly, memory access patterns are unpredictable, causing cache misses. Reordering places frequently co-accessed vertices together in memory, dramatically improving cache utilization.
Before Reordering: After Reordering:
Vertex 1 → 5, 99, 2000 Vertex 1 → 2, 3, 4
Vertex 2 → 8, 1500, 3 Vertex 2 → 1, 3, 5
(scattered neighbors) (nearby neighbors)
The effect is visible when plotting the adjacency matrix — a well-ordered graph has non-zero entries clustered near the diagonal, while a poorly-ordered graph has them scattered:


| Category | Algorithms | Best For |
|---|---|---|
| Basic | ORIGINAL, RANDOM, SORT | Baseline comparisons |
| Hub-Based | HUBSORT, HUBCLUSTER | Power-law graphs |
| DBG-Based | DBG, HUBSORTDBG, HUBCLUSTERDBG | Cache locality |
| Community | RABBITORDER | Hierarchical communities |
| Classic | GORDER, CORDER, RCM | Bandwidth reduction |
| Leiden-Based | LeidenOrder (15, baseline) | Strong community structure |
| Flow-Edge | GoGraphOrder (16) | Directed graphs (M(σ) optimization) |
| Hybrid | GraphBrewOrder (12), MAP (13), AdaptiveOrder (14) | External/Adaptive selection |
Keep original vertex ordering
./bench/bin/pr -f graph.el -s -o 0 -n 3- Description: Uses vertices in their original order from the input file
- Complexity: O(1) - no reordering
- Best for: Baseline comparison, already well-ordered graphs
- When to use: Always run this first to establish baseline performance
Random vertex permutation
./bench/bin/pr -f graph.el -s -o 1 -n 3- Description: Randomly shuffles all vertices
- Complexity: O(n) where n = number of vertices
- Best for: Testing, worst-case scenarios
- When to use: Debugging, establishing worst-case baseline
Sort vertices by degree
./bench/bin/pr -f graph.el -s -o 2 -n 3- Description: Sorts vertices by degree (descending by default, high-degree first)
- Complexity: O(n log n)
- Best for: Grouping high-degree vertices together for cache locality
- When to use: Simple degree-based baseline, quick improvement over original ordering
These algorithms prioritize high-degree vertices (hubs) which are accessed frequently.
Sort by degree (hubs first)
./bench/bin/pr -f graph.el -s -o 3 -n 3- Description: Places high-degree vertices (hubs) at the beginning
- Complexity: O(n log n)
- Rationale: Hubs are accessed most frequently; placing them together improves cache reuse
- Best for: Power-law graphs (social networks, web graphs)
How it works:
Original: v1(deg=5), v2(deg=100), v3(deg=2), v4(deg=50)
After: v2(deg=100), v4(deg=50), v1(deg=5), v3(deg=2)

Cluster hubs with their neighbors
./bench/bin/pr -f graph.el -s -o 4 -n 3- Description: Places each hub followed by its neighbors
- Complexity: O(n + m) where m = number of edges
- Rationale: When accessing a hub, its neighbors are likely accessed next
- Best for: Graphs with hub-and-spoke patterns
How it works:
Hub v2 has neighbors: v1, v5, v8
Ordering: v2, v1, v5, v8, [next hub], ...

Degree-Based Grouping (DBG) creates "frequency zones" based on access patterns.
Degree-Based Grouping
./bench/bin/pr -f graph.el -s -o 5 -n 3- Description: Groups vertices by degree into logarithmic buckets
- Complexity: O(n)
- Rationale: Vertices with similar degrees have similar access frequencies
- Best for: General-purpose, works well on most graphs
Bucket structure:
Bucket 0: degree 1
Bucket 1: degree 2-3
Bucket 2: degree 4-7
Bucket 3: degree 8-15
...
HUBSORT within DBG buckets
./bench/bin/pr -f graph.el -s -o 6 -n 3- Description: First groups by DBG, then sorts each bucket by degree
- Complexity: O(n log n)
- Best for: Combines benefits of both approaches
HUBCLUSTER within DBG buckets
./bench/bin/pr -f graph.el -s -o 7 -n 3- Description: First groups by DBG, then clusters hubs with neighbors in each bucket
- Complexity: O(n + m)
- Best for: Power-law graphs with clear hub structure
All hub-based algorithms (HUBSORT, HUBCLUSTER, DBG, HUBSORTDBG, HUBCLUSTERDBG) include guards for empty subgraphs:
// Guard against empty graphs (prevents division by zero)
if (num_nodes == 0) {
return; // Nothing to reorder
}
const int64_t avgDegree = num_edges / num_nodes;This is important when these algorithms are used as the final algorithm in GraphBrewOrder, where community subgraphs may have no internal edges on graphs with extreme structure (e.g., Kronecker graphs).
These algorithms use different approaches: RabbitOrder detects communities, while GORDER, CORDER, and RCM focus on bandwidth reduction and cache optimization.
Rabbit Order (community + incremental aggregation using Louvain)
# Format: -o 8[:variant] where variant = csr (default) or boost
./bench/bin/pr -f graph.el -s -o 8 -n 3 # Default: CSR variant (native, fast)
./bench/bin/pr -f graph.el -s -o 8:csr -n 3 # Explicit CSR variant
./bench/bin/pr -f graph.el -s -o 8:boost -n 3 # Original Boost-based variant- Description: Hierarchical community detection with incremental aggregation (Louvain-based)
- Complexity: O(n log n) average
-
Variants:
-
csr(default): Native CSR implementation — faster, no external dependencies, improved cache locality thanks to correctness fixes and auto-adaptive resolution -
boost: Original Boost-based implementation — requires Boost library. Kept as a reference baseline.
-
-
Note: RabbitOrder is enabled by default (
RABBIT_ENABLE=1in Makefile) - Best for: Large graphs with hierarchical community structure
- Limitation: Uses Louvain (no refinement), can over-merge communities
CSR variant improvements (over the original CSR code):
- Edge weight normalization: Fixed total weight computation to avoid doubling an already-correct 2m value
- Self-loop handling: Self-loops are now included in vertex strength, matching Boost
- Permutation ordering: All single-vertex communities are interleaved in discovery order (matching Boost), instead of being segregated to the end
-
Auto-adaptive resolution: γ = clamp(14 / avg_degree, 0.5, 1.0) — prevents over-merging on dense graphs. Override via
RABBIT_RESOLUTIONenv var.
Directed graph note: Both variants read only out-edges and use the undirected modularity approximation from the original paper. The CSR fixes are valid for both symmetric and directed inputs.
Key insight: Uses a "rabbit" metaphor where vertices "hop" to form communities.

Comparison with GVE-Leiden (Algorithm 15):
| Metric | RabbitOrder | GVE-Leiden |
|---|---|---|
| Algorithm | Louvain (no refinement) | Leiden (with refinement) |
| Community Quality | Good | Better |
| Speed | Faster | Slightly slower |
| Over-merging | Can occur | Prevented by refinement |
Graph Ordering (sliding window cache optimization)
# Default (GoGraph baseline)
./bench/bin/pr -f graph.el -s -o 9 -n 3
# CSR-native variant — faster reordering, no GoGraph conversion
./bench/bin/pr -f graph.el -s -o 9:csr -n 3- Description: Greedy sliding-window algorithm that places vertices to maximize the number of 2-hop neighbors already in a cache-sized window. Pre-orders with BFS-RCM, then greedily picks the highest-scoring candidate at each step.
- Complexity: O(n × m × w) where w = window size (default 7)
- Best for: Graphs where local structure matters
-
Variants:
-
default: GoGraph baseline — converts to GoGraph adjacency format, runs original C++ GOrder -
csr: CSR-native variant — operates directly on CSRGraph iterators via lightweight BFS-RCM pre-ordering andRelabelByMappingStandalone. Faster reordering than the GoGraph baseline, equivalent PR performance, deterministic with single thread. -
fast: Parallel batch variant — scalable across threads via batch extraction with atomic score updates. Uses fan-out cap (64), stricter hub threshold (n^⅓), and active frontier for efficient extraction. Auto-tunes batch/window to thread count.
-
How it works:
- Pre-order vertices with BFS-RCM for initial locality
- Initialize priority heap with in-degree as key
- Greedy loop: extract max-priority vertex v, place it, update sliding window W:
- Push v: increment scores of v's 2-hop neighbors (out→in paths)
- Pop oldest: decrement scores of oldest's 2-hop neighbors
- Hub vertices (degree > √n) are skipped in 2-hop expansion
Fast variant details:
- Replaces UnitHeap with score array + atomic delta for thread safety
- Batch extraction: top-B vertices placed per round (B=max(64, 4×threads))
- Window auto-scaled: W=max(7, 2×batch) — still within L1 cache
- Fan-out cap: 2-hop expansion limited to first 64 out-neighbors per in-neighbor
- Per-thread dedup arrays eliminate redundant atomic exchanges in merge phase
- Usage:
-o 9:fast(recommended for graphs with high degree variance)
Cache-aware Ordering
./bench/bin/pr -f graph.el -s -o 10 -n 3- Description: Explicitly optimizes for CPU cache hierarchy
- Complexity: O(n + m)
- Best for: Cache-sensitive applications
Reverse Cuthill-McKee
# Default (GoGraph double-RCM)
./bench/bin/pr -f graph.el -s -o 11 -n 3
# BNF variant — faster reordering with George-Liu + BNF starting node
./bench/bin/pr -f graph.el -s -o 11:bnf -n 3- Description: Classic bandwidth-reduction algorithm (BFS-based)
- Complexity: O(n + m)
- Best for: Sparse matrices, scientific computing graphs, road networks
- Note: Originally designed for sparse matrix solvers
-
Variants:
-
default: GoGraph double-RCM — runs two full RCM passes (Transform + explicit), high-quality ordering -
bnf: CSR-native BNF variant — George-Liu pseudo-peripheral node finder with BNF width-minimizing criterion from RCM++ (Hou et al. 2024), deterministic parallel CM BFS via speculative atomic_min resolution (inspired by Mlakar et al. 2021). Faster reordering than the GoGraph baseline, comparable algorithm performance
-
How it works:
- Start from a peripheral vertex (far from center)
- BFS traversal, ordering by increasing degree
- Reverse the final ordering
BNF variant improvements:
- George-Liu iteration finds a pseudo-peripheral starting node (higher eccentricity → better bandwidth)
- BNF criterion (RCM++ paper) selects the candidate with minimum BFS level-set width
- Deterministic parallel BFS: two-phase speculative discovery where the lowest-index frontier parent always wins shared neighbors (matching serial CM behavior)
- CSR-native — no intermediate adjacency-list conversion
- Parallel component processing: Components processed independently via prefix-sum offsets (no serial bottleneck)
- Fast-path for small components: Skip BNF for components ≤32 vertices; max 3 George-Liu iterations for ≤256
GraphBrew-RCM variant (-o 12:rcm): GraphBrewOrder also supports a composable RCM variant that applies Leiden community detection, then runs RCM within each community in parallel (embarrassingly parallel), and orders communities via BNF-quality RCM on the super-graph. See GraphBrewOrder for details.

Per-community reordering with variant support
# Format: -o 12[:variant[:frequency[:intra_algo[:resolution[:maxIterations[:maxPasses]]]]]]
./bench/bin/pr -f graph.el -s -o 12 -n 3 # Use defaults (leiden variant)
./bench/bin/pr -f graph.el -s -o 12:leiden -n 3 # Explicit leiden variant
./bench/bin/pr -f graph.el -s -o 12:rabbit -n 3 # RabbitOrder single-pass
./bench/bin/pr -f graph.el -s -o 12:hubcluster -n 3 # Hub-based clustering- Description: Runs GraphBrew +community detection, then applies per-community reordering
-
Variants (powered by GraphBrew pipeline):
-
leiden: GraphBrew Leiden with GVE-CSR aggregation - default -
rabbit: GraphBrew RabbitOrder community detection (supports all ordering strategies) -
hubcluster: GraphBrew Leiden + hub-cluster ordering -
rcm: GraphBrew Leiden + per-community RCM + super-graph BNF-RCM (bandwidth minimization)
-
-
Parameters:
-
final_algo: Algorithm ID (0-11) to use within communities (default: 8 = RabbitOrder) -
resolution: Leiden resolution parameter (default: auto-computed from graph) -
maxIterations: Maximum Leiden iterations (default: 10) -
maxPasses: Maximum Leiden passes (default: 10)
-
-
Dynamic thresholds: Community size thresholds are computed dynamically based on
avg_community_size/4andsqrt(N) - Best for: Fine-grained control over per-community ordering
GraphBrew Unified Framework: GraphBrewOrder provides a unified interface for graph reordering. Both Leiden and RabbitOrder community detection can be freely combined with any of the 14 ordering strategies. The graphbrew prefix is not required — pass ordering strategies directly:
# Leiden-based ordering strategies:
./bench/bin/pr -f graph.mtx -s -o 12 -n 3 # Default (leiden + per-community RabbitOrder)
./bench/bin/pr -f graph.mtx -s -o 12:hrab -n 3 # Hybrid Leiden+RabbitOrder (best locality)
./bench/bin/pr -f graph.mtx -s -o 12:dfs -n 3 # DFS dendrogram traversal
./bench/bin/pr -f graph.mtx -s -o 12:conn -n 3 # Connectivity BFS within communities
./bench/bin/pr -f graph.mtx -s -o 12:0.75 -n 3 # Fixed resolution (0.75)
# Rabbit-based ordering strategies (Rabbit detection × any ordering):
./bench/bin/pr -f graph.mtx -s -o 12:rabbit -n 3 # RabbitOrder native DFS (default)
./bench/bin/pr -f graph.mtx -s -o 12:rabbit:dbg -n 3 # Rabbit communities + DBG ordering
./bench/bin/pr -f graph.mtx -s -o 12:rabbit:hubcluster -n 3 # Rabbit communities + hub-cluster
./bench/bin/pr -f graph.mtx -s -o 12:rabbit:hrab -n 3 # Rabbit communities + hybrid orderingKey recommendations:
-
Best overall:
12:leiden(default) — Leiden + per-community RabbitOrder -
Fastest:
12:rabbit— single-pass RabbitOrder pipeline -
Power-law:
12:hubcluster— hub-aware community ordering -
Large power-law:
12:rabbit:dbg— fast Rabbit detection + DBG degree-grouping -
Large modular:
12:rabbit:hubcluster— fast Rabbit detection + hub-cluster ordering
See Command-Line-Reference#graphbreworder-12 for the full option reference and Community-Detection for algorithm details.


Load mapping from file
./bench/bin/pr -f graph.el -s -o 13:mapping.lo -n 3- Description: Loads a pre-computed vertex ordering from file
-
File formats:
.lo(list order) or.so(source order) - Best for: Using externally computed orderings
Perceptron-based algorithm selection
./bench/bin/pr -f graph.el -s -o 14 -n 3 # Default: full-graph, fastest-execution
./bench/bin/pr -f graph.el -s -o 14::::0 -n 3 # Full-graph, fastest-reorder
./bench/bin/pr -f graph.el -s -o 14::::1:web-Google -n 3 # With graph name hint- Description: Uses ML to select the best algorithm for the graph
- Features: 18 linear + 5 quadratic cross-terms + 1 IISWC'18 CL (24 total)
- Safety: OOD guardrail, ORIGINAL margin fallback
-
CLI format:
-o 14[:_[:_[:_[:selection_mode[:graph_name]]]]](positions 0-2 reserved) - Selection modes: 0=fastest-reorder, 1=fastest-execution (default), 2=best-endtoend, 3=best-amortization, 4=decision-tree, 5=hybrid (DT+perceptron), 6=database (kNN)

See AdaptiveOrder-ML for the full ML model details.
GraphBrew provides LeidenOrder as a baseline reference implementation.
Leiden community detection via GVE-Leiden library
# Format: -o 15:resolution
./bench/bin/pr -f graph.el -s -o 15 -n 3 # Default (auto-resolution)
./bench/bin/pr -f graph.el -s -o 15:0.75 -n 3 # Lower resolution
./bench/bin/pr -f graph.el -s -o 15:1.5 -n 3 # Higher resolution- Description: Leiden community detection using the external GVE-Leiden library (requires CSR→DiGraph conversion)
- Complexity: O(n log n) average
- Best for: Baseline comparison — measures how much GraphBrewOrder (12) improved over the reference implementation
- Default resolution: Auto-detected via continuous formula (0.5-1.2) with CV guardrail for power-law graphs
- Note: The SSOT defines IDs 0-16. LeidenCSR's CSR-native Leiden implementation is part of GraphBrewOrder (12).
Key features:
- Uses GVE-Leiden C++ library by Subhajit Sahu (
external/leiden/) - Requires CSR → DiGraph format conversion (adds overhead)
- Produces high-quality modularity scores (reference quality)
Flow-edge M(σ) optimization
# Format: -o 16[:variant]
./bench/bin/pr -f graph.el -s -o 16 -n 3 # Default (optimised faithful)
./bench/bin/pr -f graph.el -s -o 16:fast -n 3 # Iterative flow-score sorting
./bench/bin/pr -f graph.el -s -o 16:naive -n 3 # Original faithful (validation)- Description: Maximises M(σ) — the count of edges where source precedes destination in the ordering. Primarily benefits directed graphs (M(σ) is constant for symmetric/undirected graphs).
-
Variants:
-
default: Optimised faithful — flat delta array, merged pev, degree-1 short-circuit -
fast: Iterative flow-score sorting — heuristic O(n log n + m) per iteration -
naive: Original faithful (per-call map, for validation)
-
- Best for: Directed graphs where edge-direction locality matters
See Command-Line-Reference#gographorder-variants-algorithm-16 for the full variant reference.
GraphBrew supports chained orderings — applying two reordering algorithms sequentially. The first pass reorders the graph, then the second pass reorders the already-reordered graph. This is a pregeneration-only concept (the C++ AdaptiveOrder perceptron selects a single algorithm ID).
# SORT then RABBITORDER: degree-sort first, then cache-aware BFS
./bench/bin/converter -f graph.el -s -o 2 -o 8:csr -b graph_sorted_rabbit.sg
# HUBCLUSTERDBG then RABBITORDER
./bench/bin/converter -f graph.el -s -o 7 -o 8:csr -b graph_hcdbg_rabbit.sgCanonical names use + as separator: SORT+RABBITORDER_csr, HUBCLUSTERDBG+RABBITORDER_csr.
Current chains defined in scripts/lib/core/utils.py → _CHAINED_ORDERING_OPTS:
| Chain | Canonical Name | Rationale |
|---|---|---|
-o 2 -o 8:csr |
SORT+RABBITORDER_csr |
Degree sort + cache-aware BFS |
-o 2 -o 8:boost |
SORT+RABBITORDER_boost |
Degree sort + Boost-based BFS |
-o 7 -o 8:csr |
HUBCLUSTERDBG+RABBITORDER_csr |
Hub-cluster + cache BFS |
-o 2 -o 12:leiden:flat |
SORT+GraphBrewOrder_leiden |
Degree sort + community-aware |
-o 5 -o 12:leiden:flat |
DBG+GraphBrewOrder_leiden |
DBG + community-aware |
See chain_canonical_name() and CHAINED_ORDERINGS in scripts/lib/core/utils.py.
Note: For current variant lists, see
scripts/lib/core/utils.pywhich defines:
RABBITORDER_VARIANTS,GORDER_VARIANTS,RCM_VARIANTS,GRAPHBREW_VARIANTS- Use
get_algo_variants(algo_id)to query programmatically
| Graph Type | Recommended | Alternatives |
|---|---|---|
| Social Networks | GraphBrewOrder (12) | 12:rabbit:dbg, 12:rabbit:hubcluster |
| Web Graphs | GraphBrewOrder (12) | 12:rabbit:dbg, HUBCLUSTERDBG (7) |
| Road Networks | ORIGINAL (0), RCM (11) | GraphBrewOrder (12) |
| Citation Networks | GraphBrewOrder (12) | 12:rabbit:hubcluster, LeidenOrder (15) |
| Random Geometric | GraphBrewOrder (12) | GraphBrewOrder (12:rabbit) |
| Unknown | GraphBrewOrder (12) | AdaptiveOrder (14) |
| Size | Nodes | Recommended |
|---|---|---|
| Small | < 100K | Any (try several) |
| Medium | 100K - 1M | GraphBrewOrder (12) |
| Large | 1M - 100M | GraphBrewOrder (12), 12:rabbit:dbg, 12:rabbit:hubcluster |
| Very Large | > 100M | 12:rabbit:dbg, 12:rabbit:hubcluster, 12:rabbit |
Is your graph modular (has communities)?
├── Yes → Is it very large (>10M vertices)?
│ ├── Yes → 12:rabbit:dbg or 12:rabbit:hubcluster (fast detection + good ordering)
│ │ GraphBrewOrder (12) for best quality
│ └── No → GraphBrewOrder (12) - best quality
└── No/Unknown → Is it a power-law graph?
├── Yes → 12:rabbit:dbg or HUBCLUSTERDBG (7)
└── No → Try AdaptiveOrder (14)
- Running-Benchmarks - How to run experiments
- AdaptiveOrder-ML - Deep dive into ML-based selection
- Adding-New-Algorithms - Implement your own algorithm