-
Notifications
You must be signed in to change notification settings - Fork 231
Open
Labels
Description
Phase: 4
Summary
Add the ability to extract bounded subgraphs from the document graph: neighborhoods (everything within N hops of a record), induced subgraphs (the link structure connecting a set of records), and ego graphs. These operations return both the set of records and their interconnecting link structure as a first-class result type.
Motivation
- Exploration: Users need to understand the local topology around a record without knowing the full graph structure in advance.
- Visualization: Graph visualization tools need a bounded region of the graph, not individual records.
- Analysis input: Graph aggregations (GRAPH-005) often operate on subgraphs, not the entire database.
- Export: Extract a self-contained portion of the graph for external tools (Neo4j, Gephi, NetworkX).
New Result Type: Subgraph
A new first-class result type containing nodes and edges.
Java Representation
public class Subgraph {
Set<Long> nodes();
Set<Edge> edges();
int nodeCount();
int edgeCount();
public static class Edge {
long source();
long destination();
String key(); // relationship type
}
}Thrift Representation
struct TEdge {
1: i64 source,
2: i64 destination,
3: string key
}
struct TSubgraph {
1: set<i64> nodes,
2: list<TEdge> edges
}New API Methods
Neighborhood
Subgraph neighborhood(long record, int depth);
Subgraph neighborhood(long record, int depth, Collection<String> viaKeys);
Subgraph neighborhood(long record, int depth, Timestamp timestamp);
Subgraph neighborhood(Collection<Long> records, int depth);Induced Subgraph
Subgraph induced(Collection<Long> records);
Subgraph induced(Collection<Long> records, Collection<String> viaKeys);Algorithm Details
Neighborhood: BFS + Internal Edge Collection
- BFS from center record to discover all nodes within
depthhops - Second pass: collect ALL edges between discovered nodes (not just BFS tree edges)
Induced Subgraph
Iterate given records, collect all edges where both source and destination are in the set.
Safety: Node limit (default 100,000), mandatory depth parameter.
CCL Grammar Changes (ccl project)
New Tokens
NEIGHBORHOOD, INDUCED (reuses VIA, DEPTH from GRAPH-003)
New Command Productions
neighborhood <record> depth <n> [via <keys>] [at <timestamp>]
neighborhood <records> depth <n> [via <keys>]
induced <records> [via <keys>]
New Symbol Classes (2)
NeighborhoodSymbol, InducedSymbol
CaSH Examples
cash> neighborhood 1 depth 2
{nodes: [1, 3, 7, 12, 42], edges: [
{source: 1, dest: 3, key: "friends"},
{source: 1, dest: 7, key: "employer"},
{source: 3, dest: 12, key: "friends"},
{source: 7, dest: 42, key: "partner"},
{source: 12, dest: 3, key: "colleague"}
]}
cash> induced 1, 3, 7, 42
{nodes: [1, 3, 7, 42], edges: [
{source: 1, dest: 3, key: "friends"},
{source: 1, dest: 7, key: "employer"},
{source: 7, dest: 42, key: "partner"}
]}
Thrift API Changes
- New structs:
TEdge,TSubgraphindata.thrift - ~8 new method signatures in
concourse.thrift - 2 new
TCommandVerbvalues:NEIGHBORHOOD = 28,INDUCED = 29
Touch Surfaces
| Layer | File(s) | Change |
|---|---|---|
| Thrift IDL | interface/data.thrift |
TEdge, TSubgraph structs; 2 new verb values |
| Thrift IDL | interface/concourse.thrift |
~8 new method signatures |
| Thrift codegen | All language stubs | Regenerate |
| New class | Subgraph.java (driver) |
Java model |
| New class | GraphExtraction.java (server) |
Algorithm implementations |
| Server | ConcourseServer.java |
~8 handler methods |
| Server ops | Operations.java |
Atomic wrappers |
| Server dispatch | TCommandDispatcher.java |
2 new verb mappings |
| Driver | Concourse.java |
Abstract neighborhood, induced methods |
| Driver impl | ConcourseThriftDriver.java |
Thrift client implementations |
| Driver | Convert.java or new class |
TSubgraph <-> Subgraph conversion |
| CCL grammar | ccl/grammar/grammar.jjt |
NEIGHBORHOOD, INDUCED tokens + productions |
| CCL symbols | ccl/src/.../command/ |
2 new symbol classes |
| CaSH | ShellEngine.java |
Command dispatch + subgraph result formatting |
| CaSH | ApiMethodCatalog.java, CompletionService.java |
Register + completions |
| Plugin API | StatefulConcourseService.java |
New methods |
| All drivers | Python, PHP, Ruby | New methods + Subgraph type |
| Integration tests | SubgraphExtractionTest.java |
|
| Documentation | docs/guide/src/graph.md |
Subgraph section |
| Changelog | CHANGELOG.md |
Entry under 1.0.0 (TBD) |
Testing Plan
testNeighborhoodDepthZeroReturnsSingleNodetestNeighborhoodDepthOneReturnsDirectNeighborstestNeighborhoodIncludesInternalEdges(edges between discovered nodes not on BFS tree)testNeighborhoodRespectsViaKeystestNeighborhoodRespectsNodeLimittestNeighborhoodFromMultipleCentersIsUniontestInducedReturnsOnlyEdgesBetweenGivenRecordstestInducedWithNoEdgesReturnsNodesOnlytestInducedRespectsViaKeystestSubgraphEdgeDirectionality
Dependencies
- Depends on: GRAPH-002 (Graph Index), GRAPH-003 (Path-Finding) -- reuses BFS infrastructure and graph index
- Enables: GRAPH-005 (Graph Aggregations) -- many aggregations operate on extracted subgraphs
Acceptance Criteria
-
neighborhood(record, depth)returns correct node/edge set -
induced(records)returns all edges between the given records - Results include internal edges (not just BFS tree)
-
viaKeysfiltering works - Node limit prevents resource exhaustion
- CaSH renders subgraphs readably
- Subgraph type exists in all drivers
- All tests pass
- Changelog and docs updated
Complexity: Medium
Full spec: docs/specs/graph/GRAPH-004-subgraph-extraction.md
Reactions are currently unavailable