BE-429: HashQL: Add island dependency graph with data requirement resolution#8501
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
PR SummaryMedium Risk Overview Adds dominator-based data requirement resolution that marks which traversal paths each island must provide downstream and inserts/reuses synthetic data islands when no dominating origin provider exists. Adds Written by Cursor Bugbot for commit b8fe72b. This will update automatically on new commits. Configure here. |
🤖 Augment PR SummarySummary: Adds an island dependency graph and scheduler to the HashQL MIR execution engine, including automatic data requirement resolution across targets. Changes:
Technical Notes: Data island selection uses the first set origin target (backend-priority order), and schedule levels are derived from predecessor constraints to enable safe parallel execution. 🤖 Was this summary useful? React with 👍 or 👎 |
Merging this PR will degrade performance by 19.34%
Performance Changes
Comparing Footnotes |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## bm/be-428-hashql-simplify-traversal-tracking-to-path-recording #8501 +/- ##
==================================================================================================
+ Coverage 62.58% 62.76% +0.18%
==================================================================================================
Files 1322 1326 +4
Lines 134806 135528 +722
Branches 5505 5523 +18
==================================================================================================
+ Hits 84367 85065 +698
- Misses 49528 49549 +21
- Partials 911 914 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Benchmark results
|
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2002 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1001 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 3314 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 1526 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 2078 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 1033 | Flame Graph |
policy_resolution_medium
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 102 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 51 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 269 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 107 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 133 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 63 | Flame Graph |
policy_resolution_none
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 8 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 3 | Flame Graph |
policy_resolution_small
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 52 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 25 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 94 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 26 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 66 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 29 | Flame Graph |
read_scaling_complete
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id;one_depth | 1 entities | Flame Graph | |
| entity_by_id;one_depth | 10 entities | Flame Graph | |
| entity_by_id;one_depth | 25 entities | Flame Graph | |
| entity_by_id;one_depth | 5 entities | Flame Graph | |
| entity_by_id;one_depth | 50 entities | Flame Graph | |
| entity_by_id;two_depth | 1 entities | Flame Graph | |
| entity_by_id;two_depth | 10 entities | Flame Graph | |
| entity_by_id;two_depth | 25 entities | Flame Graph | |
| entity_by_id;two_depth | 5 entities | Flame Graph | |
| entity_by_id;two_depth | 50 entities | Flame Graph | |
| entity_by_id;zero_depth | 1 entities | Flame Graph | |
| entity_by_id;zero_depth | 10 entities | Flame Graph | |
| entity_by_id;zero_depth | 25 entities | Flame Graph | |
| entity_by_id;zero_depth | 5 entities | Flame Graph | |
| entity_by_id;zero_depth | 50 entities | Flame Graph |
read_scaling_linkless
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | 1 entities | Flame Graph | |
| entity_by_id | 10 entities | Flame Graph | |
| entity_by_id | 100 entities | Flame Graph | |
| entity_by_id | 1000 entities | Flame Graph | |
| entity_by_id | 10000 entities | Flame Graph |
representative_read_entity
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1
|
Flame Graph |
representative_read_entity_type
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| get_entity_type_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba
|
Flame Graph |
representative_read_multiple_entities
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_property | traversal_paths=0 | 0 | |
| entity_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=0 | 0 | |
| link_by_source_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true |
scenarios
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| full_test | query-limited | Flame Graph | |
| full_test | query-unlimited | Flame Graph | |
| linked_queries | query-limited | Flame Graph | |
| linked_queries | query-unlimited | Flame Graph |
922e42b to
572197c
Compare
7014842 to
e6439e0
Compare
572197c to
74ce977
Compare
e6439e0 to
2346ece
Compare
74ce977 to
b8fe72b
Compare
2346ece to
5e31131
Compare
|
Deployment failed with the following error: |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix prepared fixes for all 3 issues found in the latest run.
- ✅ Fixed: Cycles silently drop scheduled islands
- Added a full-node-count assertion in
schedule_inso cyclic control-flow now fails loudly instead of returning a partial schedule.
- Added a full-node-count assertion in
- ✅ Fixed: Range bound semantics are lost
- Updated
IdVec::copy_withinto preserveBound::{Included,Excluded,Unbounded}when converting typed ID bounds tousizebounds.
- Updated
- ✅ Fixed: Requirement resolution assumes acyclic control flow
- Replaced DFS-postorder-based ordering with Kahn topological ordering plus a cycle assertion in island requirement resolution.
Or push these changes by commenting:
@cursor push 27aca44a02
Preview (27aca44a02)
diff --git a/libs/@local/hashql/core/src/id/vec.rs b/libs/@local/hashql/core/src/id/vec.rs
--- a/libs/@local/hashql/core/src/id/vec.rs
+++ b/libs/@local/hashql/core/src/id/vec.rs
@@ -6,7 +6,7 @@
fmt::{self, Debug},
hash::{Hash, Hasher},
marker::PhantomData,
- ops::{Deref, DerefMut, RangeBounds},
+ ops::{Bound, Deref, DerefMut, RangeBounds},
slice,
};
@@ -510,8 +510,16 @@
where
T: Copy,
{
- let start = src.start_bound().copied().map(Id::as_usize);
- let end = src.end_bound().copied().map(Id::as_usize);
+ let start = match src.start_bound() {
+ Bound::Included(&bound) => Bound::Included(bound.as_usize()),
+ Bound::Excluded(&bound) => Bound::Excluded(bound.as_usize()),
+ Bound::Unbounded => Bound::Unbounded,
+ };
+ let end = match src.end_bound() {
+ Bound::Included(&bound) => Bound::Included(bound.as_usize()),
+ Bound::Excluded(&bound) => Bound::Excluded(bound.as_usize()),
+ Bound::Unbounded => Bound::Unbounded,
+ };
self.raw.copy_within((start, end), dst.as_usize());
}
@@ -796,3 +804,23 @@
Self::from_raw(Vec::from_iter_in(iter, alloc))
}
}
+
+#[cfg(test)]
+mod tests {
+ use super::IdVec;
+ use crate::id::{Id as _, newtype};
+
+ newtype!(#[id(crate = crate)] struct TestId(u32 is 0..=32));
+
+ #[test]
+ fn copy_within_preserves_inclusive_end_bound() {
+ let mut vec = IdVec::<TestId, i32>::from_raw(alloc::vec![10, 20, 30]);
+
+ vec.copy_within(
+ TestId::from_usize(1)..=TestId::from_usize(1),
+ TestId::from_usize(0),
+ );
+
+ assert_eq!(vec.as_slice().as_raw(), &[20, 20, 30]);
+ }
+}
diff --git a/libs/@local/hashql/mir/src/pass/execution/island/graph/mod.rs b/libs/@local/hashql/mir/src/pass/execution/island/graph/mod.rs
--- a/libs/@local/hashql/mir/src/pass/execution/island/graph/mod.rs
+++ b/libs/@local/hashql/mir/src/pass/execution/island/graph/mod.rs
@@ -20,7 +20,7 @@
#[cfg(test)]
pub(crate) mod tests;
-use alloc::alloc::Global;
+use alloc::{alloc::Global, collections::VecDeque};
use core::{
alloc::Allocator,
ops::{Index, IndexMut},
@@ -29,11 +29,10 @@
use hashql_core::{
debug_panic,
graph::{
- DirectedGraph, EdgeId, LinkedGraph, NodeId, Predecessors, Successors, Traverse as _,
+ DirectedGraph, EdgeId, LinkedGraph, NodeId, Predecessors, Successors,
algorithms::{Dominators, dominators},
linked::Edge,
},
- heap::CollectIn as _,
id::{
HasId as _, Id as _,
bit_vec::{BitMatrix, DenseBitSet},
@@ -243,13 +242,39 @@
where
S: Allocator + Clone,
{
- let mut topo: Vec<IslandId, _> = self
- .inner
- .depth_first_forest_post_order()
- .map(|node| IslandId::new(node.as_u32()))
- .collect_in(scratch.clone());
- topo.reverse();
+ #[expect(clippy::cast_possible_truncation)]
+ let node_count = self.node_count();
+ let mut in_degree = IslandVec::from_elem_in(0_u32, node_count, scratch.clone());
+ for (island_id, _) in self.iter_nodes() {
+ in_degree[island_id] = self.predecessors(island_id).count() as u32;
+ }
+ let mut queue: VecDeque<IslandId, _> = VecDeque::new_in(scratch.clone());
+ for (island_id, _) in self.iter_nodes() {
+ if in_degree[island_id] == 0 {
+ queue.push_back(island_id);
+ }
+ }
+
+ let mut topo = Vec::with_capacity_in(node_count, scratch.clone());
+ while let Some(island_id) = queue.pop_front() {
+ topo.push(island_id);
+
+ for successor in self.successors(island_id) {
+ in_degree[successor] -= 1;
+
+ if in_degree[successor] == 0 {
+ queue.push_back(successor);
+ }
+ }
+ }
+
+ assert_eq!(
+ topo.len(),
+ node_count,
+ "island requirement resolution requires acyclic control flow",
+ );
+
let start = self.lookup[BasicBlockId::START];
RequirementResolver::new(self, start, scratch).resolve(&topo);
@@ -379,7 +404,7 @@
}
fn resolve(mut self, topo: &[IslandId]) {
- // Iterate in reverse for topological order
+ // Iterate in topological order so dominators are processed before dependents.
for &island_id in topo {
let island = &self.graph[island_id];
let IslandKind::Exec(_) = &island.kind else {
diff --git a/libs/@local/hashql/mir/src/pass/execution/island/schedule/mod.rs b/libs/@local/hashql/mir/src/pass/execution/island/schedule/mod.rs
--- a/libs/@local/hashql/mir/src/pass/execution/island/schedule/mod.rs
+++ b/libs/@local/hashql/mir/src/pass/execution/island/schedule/mod.rs
@@ -122,6 +122,12 @@
}
}
+ assert_eq!(
+ entries.len(),
+ node_count,
+ "island schedule requires acyclic control flow",
+ );
+
entries.sort_by_key(|entry| entry.level);
IslandSchedule { entries }
}This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
| } | ||
|
|
||
| entries.sort_by_key(|entry| entry.level); | ||
| IslandSchedule { entries } |
There was a problem hiding this comment.
Cycles silently drop scheduled islands
High Severity
schedule_in uses Kahn’s algorithm but never checks whether all nodes were emitted. If the IslandGraph contains a control-flow cycle, queue can start empty or stall, and entries omits islands. The function still returns a seemingly valid IslandSchedule, which can cause missing execution steps.
Additional Locations (1)
| let end = src.end_bound().copied().map(Id::as_usize); | ||
|
|
||
| self.raw.copy_within((start, end), dst.as_usize()); | ||
| } |
There was a problem hiding this comment.
Range bound semantics are lost
High Severity
IdVec::copy_within strips RangeBounds boundary kind by converting bounds to plain Option<usize>. This makes inclusive and exclusive end bounds indistinguishable, so ranges like parent..=parent can become empty. In graph/mod.rs, inherited data is copied with that exact pattern, so merged_provides inheritance can silently fail.
Additional Locations (1)
| .map(|node| IslandId::new(node.as_u32())) | ||
| .collect_in(scratch.clone()); | ||
| topo.reverse(); | ||
|
|
There was a problem hiding this comment.
Requirement resolution assumes acyclic control flow
High Severity
resolve treats reversed DFS postorder as topological order without checking for cycles. In cyclic CFGs, islands can be processed before their dominator providers, so inherit_provides copies stale merged_provides and never revisits the node. This can leave required inherited paths unresolved or resolved via unnecessary data islands.




🌟 What is the purpose of this PR?
This PR implements the island dependency graph construction and scheduling system for the HashQL MIR execution engine. It builds a directed graph over computation islands, resolves data requirements between islands, and computes a topological schedule with parallelism levels for execution.
🔗 Related links
🔍 What does this change?
first_set()method toFiniteBitSet- Returns the first set bit usingtrailing_zeros(), with comprehensive test coverage for empty sets, single bits, multiple bits, and wide integral typesIdVecwith new utility methods - Addsfrom_raw(),from_domain_derive(),extend_from_slice(),append(),into_iter_enumerated(), andcopy_within()methods with detailed documentation and examplesIslandGraph) - Creates a directed graph overIslandNodes connected by three edge types:ControlFlow(execution ordering),DataFlow(data dependencies), andInherits(path inheritance between same-target dominators)IslandSchedule) - Computes topological ordering with parallelism levels using Kahn's algorithm, allowing islands at the same level to execute concurrentlyoption_into_flat_iterfeature - Adds the unstable feature flag for iterator functionalityPre-Merge Checklist 🚀
🚢 Has this modified a publishable library?
This PR:
📜 Does this require a change to the docs?
The changes in this PR:
🕸️ Does this require a change to the Turbo Graph?
The changes in this PR:
🛡 What tests cover this?
FiniteBitSet::first_set()covering empty sets, single/multiple bits, edge cases❓ How to test this?
cargo test