Skip to content

[GRAPH-004] Subgraph Extraction #622

@jtnelson

Description

@jtnelson

Phase: 4

Summary

Add the ability to extract bounded subgraphs from the document graph: neighborhoods (everything within N hops of a record), induced subgraphs (the link structure connecting a set of records), and ego graphs. These operations return both the set of records and their interconnecting link structure as a first-class result type.

Motivation

  • Exploration: Users need to understand the local topology around a record without knowing the full graph structure in advance.
  • Visualization: Graph visualization tools need a bounded region of the graph, not individual records.
  • Analysis input: Graph aggregations (GRAPH-005) often operate on subgraphs, not the entire database.
  • Export: Extract a self-contained portion of the graph for external tools (Neo4j, Gephi, NetworkX).

New Result Type: Subgraph

A new first-class result type containing nodes and edges.

Java Representation

public class Subgraph {
    Set<Long> nodes();
    Set<Edge> edges();
    int nodeCount();
    int edgeCount();

    public static class Edge {
        long source();
        long destination();
        String key(); // relationship type
    }
}

Thrift Representation

struct TEdge {
    1: i64 source,
    2: i64 destination,
    3: string key
}

struct TSubgraph {
    1: set<i64> nodes,
    2: list<TEdge> edges
}

New API Methods

Neighborhood

Subgraph neighborhood(long record, int depth);
Subgraph neighborhood(long record, int depth, Collection<String> viaKeys);
Subgraph neighborhood(long record, int depth, Timestamp timestamp);
Subgraph neighborhood(Collection<Long> records, int depth);

Induced Subgraph

Subgraph induced(Collection<Long> records);
Subgraph induced(Collection<Long> records, Collection<String> viaKeys);

Algorithm Details

Neighborhood: BFS + Internal Edge Collection

  1. BFS from center record to discover all nodes within depth hops
  2. Second pass: collect ALL edges between discovered nodes (not just BFS tree edges)

Induced Subgraph

Iterate given records, collect all edges where both source and destination are in the set.

Safety: Node limit (default 100,000), mandatory depth parameter.

CCL Grammar Changes (ccl project)

New Tokens

NEIGHBORHOOD, INDUCED (reuses VIA, DEPTH from GRAPH-003)

New Command Productions

neighborhood <record> depth <n> [via <keys>] [at <timestamp>]
neighborhood <records> depth <n> [via <keys>]
induced <records> [via <keys>]

New Symbol Classes (2)

NeighborhoodSymbol, InducedSymbol

CaSH Examples

cash> neighborhood 1 depth 2
{nodes: [1, 3, 7, 12, 42], edges: [
  {source: 1, dest: 3, key: "friends"},
  {source: 1, dest: 7, key: "employer"},
  {source: 3, dest: 12, key: "friends"},
  {source: 7, dest: 42, key: "partner"},
  {source: 12, dest: 3, key: "colleague"}
]}

cash> induced 1, 3, 7, 42
{nodes: [1, 3, 7, 42], edges: [
  {source: 1, dest: 3, key: "friends"},
  {source: 1, dest: 7, key: "employer"},
  {source: 7, dest: 42, key: "partner"}
]}

Thrift API Changes

  • New structs: TEdge, TSubgraph in data.thrift
  • ~8 new method signatures in concourse.thrift
  • 2 new TCommandVerb values: NEIGHBORHOOD = 28, INDUCED = 29

Touch Surfaces

Layer File(s) Change
Thrift IDL interface/data.thrift TEdge, TSubgraph structs; 2 new verb values
Thrift IDL interface/concourse.thrift ~8 new method signatures
Thrift codegen All language stubs Regenerate
New class Subgraph.java (driver) Java model
New class GraphExtraction.java (server) Algorithm implementations
Server ConcourseServer.java ~8 handler methods
Server ops Operations.java Atomic wrappers
Server dispatch TCommandDispatcher.java 2 new verb mappings
Driver Concourse.java Abstract neighborhood, induced methods
Driver impl ConcourseThriftDriver.java Thrift client implementations
Driver Convert.java or new class TSubgraph <-> Subgraph conversion
CCL grammar ccl/grammar/grammar.jjt NEIGHBORHOOD, INDUCED tokens + productions
CCL symbols ccl/src/.../command/ 2 new symbol classes
CaSH ShellEngine.java Command dispatch + subgraph result formatting
CaSH ApiMethodCatalog.java, CompletionService.java Register + completions
Plugin API StatefulConcourseService.java New methods
All drivers Python, PHP, Ruby New methods + Subgraph type
Integration tests SubgraphExtractionTest.java
Documentation docs/guide/src/graph.md Subgraph section
Changelog CHANGELOG.md Entry under 1.0.0 (TBD)

Testing Plan

  • testNeighborhoodDepthZeroReturnsSingleNode
  • testNeighborhoodDepthOneReturnsDirectNeighbors
  • testNeighborhoodIncludesInternalEdges (edges between discovered nodes not on BFS tree)
  • testNeighborhoodRespectsViaKeys
  • testNeighborhoodRespectsNodeLimit
  • testNeighborhoodFromMultipleCentersIsUnion
  • testInducedReturnsOnlyEdgesBetweenGivenRecords
  • testInducedWithNoEdgesReturnsNodesOnly
  • testInducedRespectsViaKeys
  • testSubgraphEdgeDirectionality

Dependencies

  • Depends on: GRAPH-002 (Graph Index), GRAPH-003 (Path-Finding) -- reuses BFS infrastructure and graph index
  • Enables: GRAPH-005 (Graph Aggregations) -- many aggregations operate on extracted subgraphs

Acceptance Criteria

  • neighborhood(record, depth) returns correct node/edge set
  • induced(records) returns all edges between the given records
  • Results include internal edges (not just BFS tree)
  • viaKeys filtering works
  • Node limit prevents resource exhaustion
  • CaSH renders subgraphs readably
  • Subgraph type exists in all drivers
  • All tests pass
  • Changelog and docs updated

Complexity: Medium

Full spec: docs/specs/graph/GRAPH-004-subgraph-extraction.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions