-
Notifications
You must be signed in to change notification settings - Fork 231
Description
Phase: 2 (Infrastructure)
Summary
Introduce a dedicated graph index structure that maintains adjacency lists for link relationships. This index enables O(1) neighbor lookups for both outgoing and incoming links, replacing the current O(N) full-table-scan implementation of trace and providing the performance foundation for path-finding, subgraph extraction, and graph aggregations.
Motivation
Current Performance Problem
The existing trace implementation (Operations.traceRecordAtomic, line 1816) iterates over every record in the entire database to find incoming links:
for (long source : atomic.getAllRecords()) {
// select ALL data from each record, then filter for links pointing to the target...
}This is O(N * K) where N = total records and K = average keys per record. For a database with 1M records, a single trace call reads the entire dataset. This makes any graph algorithm that repeatedly needs neighbor lookups (path-finding, BFS, connected components) completely impractical at scale.
What the Graph Index Provides
| Operation | Without Index | With Index |
|---|---|---|
trace(record) (incoming) |
O(N * K) | O(degree) |
follows(record) (outgoing) |
O(K) | O(degree) |
| "Does A link to B?" | O(K) | O(1) |
| "All neighbors of A" | O(N * K) | O(degree) |
| Path-finding (BFS) | Infeasible | O(V + E) |
Detailed Design
Adjacency Index Structure (Recommended: Standalone GraphIndex)
public class GraphIndex {
// Outgoing adjacency: source -> {key -> {dest...}}
private final ConcurrentHashMap<Long, Map<String, Set<Long>>> outgoing;
// Incoming adjacency: dest -> {key -> {source...}}
private final ConcurrentHashMap<Long, Map<String, Set<Long>>> incoming;
void addLink(String key, long source, long destination);
void removeLink(String key, long source, long destination);
Map<String, Set<Long>> outgoing(long record);
Map<String, Set<Long>> incoming(long record);
boolean hasLink(String key, long source, long destination);
Set<Long> neighbors(long record);
Set<Long> neighbors(long record, String key);
}Index Maintenance
Hooks into the write pipeline at the same point where the search index is updated:
| Write Type | Link Action |
|---|---|
add(key, Link.to(dest), source) |
addLink(key, source, dest) |
remove(key, Link.to(dest), source) |
removeLink(key, source, dest) |
set(key, Link.to(dest), source) |
Remove old links on key, add new |
clear(key, source) |
Remove all links on key from source |
Historical Queries
The graph index is a current-state index. Historical queries (follows(record, timestamp) or trace(record, timestamp)) fall back to the primary storage engine's versioned data. This is acceptable for infrequent historical queries.
Persistence
Initial implementation: rebuild from primary data on startup. Optimize to disk persistence later if startup time becomes a problem.
Rewriting trace() to Use the Index
// Before: O(N) full table scan
public static Map<String, Set<Long>> traceRecordAtomic(
long record, long timestamp, AtomicOperation atomic) {
for (long source : atomic.getAllRecords()) { ... }
}
// After: O(degree) index lookup
public static Map<String, Set<Long>> traceRecordAtomic(
long record, long timestamp, AtomicOperation atomic) {
if(timestamp == Time.NONE) {
return atomic.graphIndex().incoming(record);
}
else {
return traceRecordAtomicHistorical(record, timestamp, atomic);
}
}CCL Grammar Changes
None required. The graph index is an internal optimization. No new user-facing syntax is needed.
Touch Surfaces
| Layer | File(s) | Change |
|---|---|---|
| New class | concourse-server/.../storage/GraphIndex.java |
New adjacency index implementation |
| Storage | Engine.java |
Initialize and maintain GraphIndex; expose via accessor |
| Storage | BufferedStore.java |
Hook graph index updates into write pipeline |
| Storage | Buffer.java / Database.java |
Pass link writes to graph index |
| Server ops | Operations.java |
Rewrite traceRecordAtomic / traceRecordsAtomic to use index |
| Server ops | Operations.java |
Rewrite followsRecordAtomic (from GRAPH-001) to use index |
| Server ops | Stores.java |
Optimize selectWithNavigationTracking link resolution |
| Server ops | Stores.java |
Update NavigationKeyFinder to use index cardinality |
| Store interface | Store.java |
Add graphIndex() accessor (with no-op default) |
| Atomic ops | AtomicOperation.java |
Propagate graph index through atomic context |
| Compaction | Compaction pipeline | Ensure graph index stays consistent during segment merges |
| Startup | ConcourseServer.java or Engine.java |
Rebuild graph index on first start / upgrade |
| Transaction | Transaction rollback path | Undo graph index mutations on abort |
| Integration tests | concourse-integration-tests/ |
Add GraphIndexTest.java |
| Unit tests | concourse-server/test/ |
Add GraphIndexTest.java for index operations |
| Benchmarks | concourse-ete-tests/ |
Benchmark trace/follows with and without index |
Testing Plan
Unit Tests
testAddLinkCreatesOutgoingEntrytestAddLinkCreatesIncomingEntrytestRemoveLinkDeletesOutgoingEntrytestRemoveLinkDeletesIncomingEntrytestHasLinkReturnsTrueForExistingLinktestHasLinkReturnsFalseAfterRemovaltestNeighborsReturnsAllDestinationstestNeighborsFiltersByKeytestConcurrentAddAndRemovetestRebuildFromStore
Integration Tests
testTraceUsesGraphIndex(verify same results, faster)testFollowsUsesGraphIndextestGraphIndexSurvivesRestart(rebuild)testGraphIndexConsistentAfterTransactiontestGraphIndexConsistentAfterRollbacktestGraphIndexConsistentDuringCompaction
Dependencies
- Depends on: GRAPH-001 (Follows) -- the index accelerates
follows()butfollows()should exist first - Blocks: GRAPH-003 (Path-Finding), GRAPH-004 (Subgraph Extraction), GRAPH-005 (Graph Aggregations)
- Optimizes: Existing
trace(),navigate(), andconsolidate()
Acceptance Criteria
-
trace(record)returns identical results before and after index, but completes in O(degree) not O(N) -
follows(record)uses index for current-state queries - Historical queries still return correct results
- Index is maintained correctly across add, remove, set, clear operations
- Index survives server restart (via rebuild)
- Transaction rollback undoes index mutations
- Compaction does not corrupt the index
- No regressions in existing test suite
Risks
- Memory overhead: Measure footprint for databases with millions of links
- Write amplification: Every link write now updates two structures
- Consistency: The graph index must never diverge from the primary store -- extensive edge case testing required
Complexity: High
Full spec: docs/specs/graph/GRAPH-002-graph-indexing.md