Skip to content

[GRAPH-002] Graph Indexing: Adjacency Index Infrastructure #620

@jtnelson

Description

@jtnelson

Phase: 2 (Infrastructure)

Summary

Introduce a dedicated graph index structure that maintains adjacency lists for link relationships. This index enables O(1) neighbor lookups for both outgoing and incoming links, replacing the current O(N) full-table-scan implementation of trace and providing the performance foundation for path-finding, subgraph extraction, and graph aggregations.

Motivation

Current Performance Problem

The existing trace implementation (Operations.traceRecordAtomic, line 1816) iterates over every record in the entire database to find incoming links:

for (long source : atomic.getAllRecords()) {
    // select ALL data from each record, then filter for links pointing to the target...
}

This is O(N * K) where N = total records and K = average keys per record. For a database with 1M records, a single trace call reads the entire dataset. This makes any graph algorithm that repeatedly needs neighbor lookups (path-finding, BFS, connected components) completely impractical at scale.

What the Graph Index Provides

Operation Without Index With Index
trace(record) (incoming) O(N * K) O(degree)
follows(record) (outgoing) O(K) O(degree)
"Does A link to B?" O(K) O(1)
"All neighbors of A" O(N * K) O(degree)
Path-finding (BFS) Infeasible O(V + E)

Detailed Design

Adjacency Index Structure (Recommended: Standalone GraphIndex)

public class GraphIndex {
    // Outgoing adjacency: source -> {key -> {dest...}}
    private final ConcurrentHashMap<Long, Map<String, Set<Long>>> outgoing;

    // Incoming adjacency: dest -> {key -> {source...}}
    private final ConcurrentHashMap<Long, Map<String, Set<Long>>> incoming;

    void addLink(String key, long source, long destination);
    void removeLink(String key, long source, long destination);
    Map<String, Set<Long>> outgoing(long record);
    Map<String, Set<Long>> incoming(long record);
    boolean hasLink(String key, long source, long destination);
    Set<Long> neighbors(long record);
    Set<Long> neighbors(long record, String key);
}

Index Maintenance

Hooks into the write pipeline at the same point where the search index is updated:

Write Type Link Action
add(key, Link.to(dest), source) addLink(key, source, dest)
remove(key, Link.to(dest), source) removeLink(key, source, dest)
set(key, Link.to(dest), source) Remove old links on key, add new
clear(key, source) Remove all links on key from source

Historical Queries

The graph index is a current-state index. Historical queries (follows(record, timestamp) or trace(record, timestamp)) fall back to the primary storage engine's versioned data. This is acceptable for infrequent historical queries.

Persistence

Initial implementation: rebuild from primary data on startup. Optimize to disk persistence later if startup time becomes a problem.

Rewriting trace() to Use the Index

// Before: O(N) full table scan
public static Map<String, Set<Long>> traceRecordAtomic(
        long record, long timestamp, AtomicOperation atomic) {
    for (long source : atomic.getAllRecords()) { ... }
}

// After: O(degree) index lookup
public static Map<String, Set<Long>> traceRecordAtomic(
        long record, long timestamp, AtomicOperation atomic) {
    if(timestamp == Time.NONE) {
        return atomic.graphIndex().incoming(record);
    }
    else {
        return traceRecordAtomicHistorical(record, timestamp, atomic);
    }
}

CCL Grammar Changes

None required. The graph index is an internal optimization. No new user-facing syntax is needed.

Touch Surfaces

Layer File(s) Change
New class concourse-server/.../storage/GraphIndex.java New adjacency index implementation
Storage Engine.java Initialize and maintain GraphIndex; expose via accessor
Storage BufferedStore.java Hook graph index updates into write pipeline
Storage Buffer.java / Database.java Pass link writes to graph index
Server ops Operations.java Rewrite traceRecordAtomic / traceRecordsAtomic to use index
Server ops Operations.java Rewrite followsRecordAtomic (from GRAPH-001) to use index
Server ops Stores.java Optimize selectWithNavigationTracking link resolution
Server ops Stores.java Update NavigationKeyFinder to use index cardinality
Store interface Store.java Add graphIndex() accessor (with no-op default)
Atomic ops AtomicOperation.java Propagate graph index through atomic context
Compaction Compaction pipeline Ensure graph index stays consistent during segment merges
Startup ConcourseServer.java or Engine.java Rebuild graph index on first start / upgrade
Transaction Transaction rollback path Undo graph index mutations on abort
Integration tests concourse-integration-tests/ Add GraphIndexTest.java
Unit tests concourse-server/test/ Add GraphIndexTest.java for index operations
Benchmarks concourse-ete-tests/ Benchmark trace/follows with and without index

Testing Plan

Unit Tests

  • testAddLinkCreatesOutgoingEntry
  • testAddLinkCreatesIncomingEntry
  • testRemoveLinkDeletesOutgoingEntry
  • testRemoveLinkDeletesIncomingEntry
  • testHasLinkReturnsTrueForExistingLink
  • testHasLinkReturnsFalseAfterRemoval
  • testNeighborsReturnsAllDestinations
  • testNeighborsFiltersByKey
  • testConcurrentAddAndRemove
  • testRebuildFromStore

Integration Tests

  • testTraceUsesGraphIndex (verify same results, faster)
  • testFollowsUsesGraphIndex
  • testGraphIndexSurvivesRestart (rebuild)
  • testGraphIndexConsistentAfterTransaction
  • testGraphIndexConsistentAfterRollback
  • testGraphIndexConsistentDuringCompaction

Dependencies

  • Depends on: GRAPH-001 (Follows) -- the index accelerates follows() but follows() should exist first
  • Blocks: GRAPH-003 (Path-Finding), GRAPH-004 (Subgraph Extraction), GRAPH-005 (Graph Aggregations)
  • Optimizes: Existing trace(), navigate(), and consolidate()

Acceptance Criteria

  • trace(record) returns identical results before and after index, but completes in O(degree) not O(N)
  • follows(record) uses index for current-state queries
  • Historical queries still return correct results
  • Index is maintained correctly across add, remove, set, clear operations
  • Index survives server restart (via rebuild)
  • Transaction rollback undoes index mutations
  • Compaction does not corrupt the index
  • No regressions in existing test suite

Risks

  • Memory overhead: Measure footprint for databases with millions of links
  • Write amplification: Every link write now updates two structures
  • Consistency: The graph index must never diverge from the primary store -- extensive edge case testing required

Complexity: High

Full spec: docs/specs/graph/GRAPH-002-graph-indexing.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions