Skip to content

[Feature] Support graphLookup with literal value as its start#5253

Open
qianheng-aws wants to merge 4 commits intoopensearch-project:mainfrom
qianheng-aws:graphEnhance2
Open

[Feature] Support graphLookup with literal value as its start#5253
qianheng-aws wants to merge 4 commits intoopensearch-project:mainfrom
qianheng-aws:graphEnhance2

Conversation

@qianheng-aws
Copy link
Collaborator

Description

Adds support for graphlookup as a top-level PPL command (i.e., as the first command in a query without requiring source=). In this mode, the startWith option accepts either a single literal value or a list of literal values to seed the graph traversal. It will get simpler on syntax and also more execution efficiency for there common graph search cases where the start value is already know as a literal.

Syntax

graphLookup <lookupIndex> start=<literalValue> edge=<fromField><operator><toField> [maxDepth=<maxDepth>] [depthField=<depthField>] [usePIT=(true | false)] [filter=(<condition>)] as <outputField>
graphLookup <lookupIndex> start=(<literalValue1>, <literalValue2>, ...) edge=<fromField><operator><toField> [maxDepth=<maxDepth>] [depthField=<depthField>] [usePIT=(true | false)] [filter=(<condition>)] as <outputField>

PPL Examples:

graphLookup employees start="Eliot" edge=reportsTo-->name as reportingHierarchy
graphLookup employees start=("Jack", "Eliot") edge=reportsTo-->name as reportingHierarchy

Response examples:

{
    "schema": [
        {
            "name": "reportingHierarchy",
            "type": "array"
        }
    ],
    "datarows": [
        [
            [
                {
                    "name": "Eliot",
                    "reportsTo": "Ron",
                    "id": 2
                },
                {
                    "name": "Ron",
                    "reportsTo": "Andrew",
                    "id": 3
                },
                {
                    "name": "Andrew",
                    "id": 4
                }
            ]
        ]
    ],
    "total": 1,
    "size": 1
}

Related Issues

Resolves #5243

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@qianheng-aws qianheng-aws added the enhancement New feature or request label Mar 23, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 23, 2026

PR Reviewer Guide 🔍

(Review updated until commit 496a851)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Core GraphLookup AST and Calcite Plan Updates for Literal Start Support

Relevant files:

  • core/src/main/java/org/opensearch/sql/ast/tree/GraphLookup.java
  • core/src/main/java/org/opensearch/sql/calcite/plan/rel/GraphLookup.java
  • core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalGraphLookup.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableGraphLookupRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java

Sub-PR theme: PPL Parser and Grammar Updates for Top-Level GraphLookup

Relevant files:

  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
  • ppl/src/main/antlr/OpenSearchPPLParser.g4
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java

Sub-PR theme: CalciteRelNodeVisitor GraphLookup Visitor Logic

Relevant files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

Sub-PR theme: Test Cases and Documentation for Top-Level GraphLookup Feature

Relevant files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java
  • integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_graphlookup.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_graphlookup_top_level.yaml

Sub-PR theme: Documentation Updates for GraphLookup Command

Relevant files:

  • docs/user/ppl/cmd/graphlookup.md

⚡ Recommended focus areas for review

Null Pointer Risk

In literal start mode, sourceEnumerator is set to null (line 212), but later code at line 249 attempts to close it without null-checking. While there is a null-check added at line 249, the original code at line 249 in the catch block did not have this protection. The fix is present, but the pattern of conditionally initializing sourceEnumerator to null and then using it throughout the class creates fragility. If future code paths forget the null-check, NPE will occur.

// Get the source enumerator (null for literal start mode)
if (graphLookup.getStartValues() != null) {
  this.sourceEnumerator = null;
  this.startFieldIndex = -1;
} else if (graphLookup.getSource() instanceof Scannable scannable) {
  Enumerable<?> sourceEnum = scannable.scan();
  this.sourceEnumerator = (Enumerator<@Nullable Object>) sourceEnum.enumerator();
} else {
  throw new IllegalStateException(
      "Source must be Scannable, got: " + graphLookup.getSource().getClass());
}
Incomplete Error Handling

When node.getStartValues() is not null (literal start mode), the code creates a dummy LogicalValues source and sets startFieldName = null. However, if node.getChild() is not empty in this mode, the piped source is silently ignored without warning. This could mask user errors where a user accidentally provides both a piped source and literal start values. The parser should validate this mutual exclusivity earlier, or at minimum log a warning.

if (node.getStartValues() != null) {
  // Literal start mode: create empty LogicalValues as dummy source (BiRel needs two inputs)
  // And will ignore the previous pipe then.
  RelDataType dummyType =
      builder
          .getTypeFactory()
          .createStructType(
              List.of(builder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR)),
              List.of("_dummy"));
  builder.values(dummyType);
  startFieldName = null;
  startValuesForCalcite = new ArrayList<>();
  for (var lit : node.getStartValues()) {
    startValuesForCalcite.add(lit.getValue());
  }
} else {
  if (node.getChild().isEmpty()) {
    throw new SemanticCheckException(
        "Field reference start requires a piped source."
            + " Use literal start values (e.g. start='value') for top-level graphLookup.");
  }
  // Piped mode: visit source child
  visitChildren(node, context);
  // TODO: Limit the number of source rows to 100 for now, make it configurable.
  builder.limit(0, 100);
  if (node.isBatchMode()) {
    tryToRemoveMetaFields(context, true);
  }
  startFieldName = node.getStartField().getField().toString();
}

@github-actions
Copy link
Contributor

github-actions bot commented Mar 23, 2026

PR Code Suggestions ✨

Latest suggestions up to 496a851

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate field indices are found

The code uses indexOf() to find field indices but does not validate that the fields
exist (indexOf returns -1 if not found). This could cause silent failures or
incorrect behavior during BFS traversal. Add validation to ensure all required
fields are present in their respective row types.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java [223-230]

 try {
   this.lookupFields = graphLookup.getLookup().getRowType().getFieldNames();
   this.fromFieldIdx = lookupFields.indexOf(graphLookup.fromField);
   this.toFieldIdx = lookupFields.indexOf(graphLookup.toField);
+  
+  if (this.fromFieldIdx < 0) {
+    throw new IllegalArgumentException("fromField '" + graphLookup.fromField + "' not found in lookup table");
+  }
+  if (this.toFieldIdx < 0) {
+    throw new IllegalArgumentException("toField '" + graphLookup.toField + "' not found in lookup table");
+  }
 
   if (graphLookup.getStartValues() == null) {
     List<String> sourceFields = graphLookup.getSource().getRowType().getFieldNames();
     this.startFieldIndex = sourceFields.indexOf(graphLookup.getStartField());
+    if (this.startFieldIndex < 0) {
+      throw new IllegalArgumentException("startField '" + graphLookup.getStartField() + "' not found in source table");
+    }
   }
Suggestion importance[1-10]: 8

__

Why: The suggestion adds critical validation that field indices are found (not -1) before using them in BFS traversal. This prevents silent failures and incorrect behavior during graph traversal. The validation is important for correctness and provides clear error messages when fields are missing, making it a valuable improvement to the PR code.

Medium
Add error handling for invalid start syntax

The code handles three cases but lacks an else clause to detect invalid or missing
start clause syntax. If none of the conditions match, both startField and
startValues remain null, which could cause downstream NullPointerExceptions. Add an
else clause to throw a clear error for malformed syntax.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1575-1596]

 if (startCtx.startField != null) {
   // Piped mode: start=fieldExpression
   startField = (Field) internalVisitExpression(startCtx.startField);
 } else if (startCtx.startValue != null) {
   // Top-level mode: single literal e.g. start="Jack"
   startValues = List.of((Literal) internalVisitExpression(startCtx.startValue));
 } else if (startCtx.searchLiteralList() != null) {
   // Top-level mode: literal list e.g. start=("Jack", "Eliot")
   OpenSearchPPLParser.SearchLiteralsContext listCtx =
       (OpenSearchPPLParser.SearchLiteralsContext) startCtx.searchLiteralList();
   startValues = new ArrayList<>();
   for (OpenSearchPPLParser.SearchLiteralContext lit : listCtx.searchLiteral()) {
     if (lit.stringLiteral() != null) {
       startValues.add((Literal) internalVisitExpression(lit.stringLiteral()));
     } else if (lit.numericLiteral() != null) {
       startValues.add((Literal) internalVisitExpression(lit.numericLiteral()));
     } else {
       // ID, NUMERIC_ID, searchableKeyWord — treat as string
       startValues.add(new Literal(lit.getText(), DataType.STRING));
     }
   }
+} else {
+  throw new IllegalArgumentException("Invalid start clause: must specify field, literal value, or literal list");
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion adds an else clause to catch malformed start clause syntax, which is valuable error handling. This prevents silent failures where both startField and startValues remain null. The improvement is meaningful for robustness, though the parser grammar should ideally prevent such cases from occurring in the first place.

Medium
Add null and empty check before BFS

The performBfs method is called with graphLookup.getStartValues() but there is no
null check or validation that the list is non-empty. If performBfs expects a
non-empty collection, this could cause unexpected behavior. Add defensive validation
before calling performBfs.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java [280-292]

 private boolean moveNextLiteralStartMode() {
   if (literalStartCompleted) {
     return false;
   }
   literalStartCompleted = true;
 
   // Perform single BFS seeded with all literal start values
-  List<Object> bfsResults = performBfs(graphLookup.getStartValues());
+  List<Object> startValues = graphLookup.getStartValues();
+  if (startValues == null || startValues.isEmpty()) {
+    current = new Object[] {Collections.emptyList()};
+    return true;
+  }
+  List<Object> bfsResults = performBfs(startValues);
 
   // Output single row: just the hierarchy array
   current = new Object[] {bfsResults};
   return true;
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion adds defensive null and empty checks before calling performBfs(), which is reasonable error handling. However, the PR code already ensures startValues is non-null before entering literal start mode (checked at line 211), making this check somewhat redundant. The empty list check adds value but is a minor defensive improvement rather than fixing a critical issue.

Low
Validate start values and field nullability

The code should validate that startValuesForCalcite is not empty after extraction to
prevent silent failures with empty literal lists. Additionally, consider adding a
null check before calling node.getStartField().getField() in piped mode to provide
clearer error messages if the field is unexpectedly null.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2697-2728]

 List<Object> startValuesForCalcite = null;
 String startFieldName;
 if (node.getStartValues() != null) {
   // Literal start mode: create empty LogicalValues as dummy source (BiRel needs two inputs)
   // And will ignore the previous pipe then.
   RelDataType dummyType =
       builder
           .getTypeFactory()
           .createStructType(
               List.of(builder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR)),
               List.of("_dummy"));
   builder.values(dummyType);
   startFieldName = null;
   startValuesForCalcite = new ArrayList<>();
   for (var lit : node.getStartValues()) {
     startValuesForCalcite.add(lit.getValue());
   }
+  if (startValuesForCalcite.isEmpty()) {
+    throw new SemanticCheckException("Start values list cannot be empty for graphLookup.");
+  }
 } else {
   if (node.getChild().isEmpty()) {
     throw new SemanticCheckException(
         "Field reference start requires a piped source."
             + " Use literal start values (e.g. start='value') for top-level graphLookup.");
   }
   // Piped mode: visit source child
   visitChildren(node, context);
   // TODO: Limit the number of source rows to 100 for now, make it configurable.
   builder.limit(0, 100);
   if (node.isBatchMode()) {
     tryToRemoveMetaFields(context, true);
   }
+  if (node.getStartField() == null) {
+    throw new SemanticCheckException("Start field cannot be null in piped mode.");
+  }
   startFieldName = node.getStartField().getField().toString();
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion adds defensive validation for empty startValuesForCalcite and null startField, which could prevent runtime errors. However, the empty list check may be overly strict since the parser should already validate syntax, and the null check for startField in piped mode is reasonable but represents a defensive measure rather than addressing a critical bug in the PR code.

Low

Previous suggestions

Suggestions up to commit c9258d9
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null dereference in current()

current() is called before moveNext() returns true in some Calcite enumerator
patterns, and current is initialized to null. Accessing current[0] when current is
null will throw a NullPointerException. Add a null guard consistent with how the
rest of the enumerator handles uninitialized state.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java [257-264]

 @Override
 public Object current() {
   // Literal start mode: single column output, Calcite expects scalar value
   if (graphLookup.getStartValues() != null) {
-    return current[0];
+    return current != null ? current[0] : null;
   }
   // source fields + output array (normal mode) or [source array, lookup array] (batch mode)
   return current;
 }
Suggestion importance[1-10]: 5

__

Why: The current field is initialized to null and current() could be called before moveNext() in some patterns. Accessing current[0] when current is null would throw a NullPointerException. The null guard is a valid defensive improvement, though Calcite's contract typically requires moveNext() to be called first.

Low
Add error handling for unrecognized start clause

If none of the three branches match (e.g., the grammar produces an unexpected parse
tree), both startField and startValues remain null, which will silently produce a
broken GraphLookup node. Add an else branch that throws a descriptive
SemanticCheckException to fail fast with a clear error message.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1575-1596]

 if (startCtx.startField != null) {
   // Piped mode: start=fieldExpression
   startField = (Field) internalVisitExpression(startCtx.startField);
 } else if (startCtx.startValue != null) {
   // Top-level mode: single literal e.g. start="Jack"
   startValues = List.of((Literal) internalVisitExpression(startCtx.startValue));
 } else if (startCtx.searchLiteralList() != null) {
-  ...
+  // Top-level mode: literal list e.g. start=("Jack", "Eliot")
+  OpenSearchPPLParser.SearchLiteralsContext listCtx =
+      (OpenSearchPPLParser.SearchLiteralsContext) startCtx.searchLiteralList();
+  startValues = new ArrayList<>();
+  for (OpenSearchPPLParser.SearchLiteralContext lit : listCtx.searchLiteral()) {
+    if (lit.stringLiteral() != null) {
+      startValues.add((Literal) internalVisitExpression(lit.stringLiteral()));
+    } else if (lit.numericLiteral() != null) {
+      startValues.add((Literal) internalVisitExpression(lit.numericLiteral()));
+    } else {
+      startValues.add(new Literal(lit.getText(), DataType.STRING));
+    }
+  }
+} else {
+  throw new SemanticCheckException(
+      "graphLookup: start clause must specify a field reference or literal value(s).");
 }
Suggestion importance[1-10]: 4

__

Why: While adding an else branch for unrecognized start clauses is a reasonable defensive measure, the grammar in OpenSearchPPLParser.g4 already constrains the valid alternatives, making this case practically unreachable. The suggestion is valid but has low practical impact.

Low
General
Validate literal start list is non-empty

After parsing the literal list, startValues could be empty if searchLiteralList
matched but contained no literals. An empty startValues list would cause a BFS with
no seeds, producing a confusing empty result. Add a validation check to throw a
SemanticCheckException when the list is empty.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1581-1596]

 } else if (startCtx.searchLiteralList() != null) {
-  // Top-level mode: literal list e.g. start=("Jack", "Eliot")
   OpenSearchPPLParser.SearchLiteralsContext listCtx =
       (OpenSearchPPLParser.SearchLiteralsContext) startCtx.searchLiteralList();
   startValues = new ArrayList<>();
   for (OpenSearchPPLParser.SearchLiteralContext lit : listCtx.searchLiteral()) {
-    ...
+    if (lit.stringLiteral() != null) {
+      startValues.add((Literal) internalVisitExpression(lit.stringLiteral()));
+    } else if (lit.numericLiteral() != null) {
+      startValues.add((Literal) internalVisitExpression(lit.numericLiteral()));
+    } else {
+      startValues.add(new Literal(lit.getText(), DataType.STRING));
+    }
+  }
+  if (startValues.isEmpty()) {
+    throw new SemanticCheckException(
+        "graphLookup: start literal list must contain at least one value.");
   }
 }
Suggestion importance[1-10]: 4

__

Why: An empty startValues list after parsing searchLiteralList is theoretically possible if the grammar allows it, but the grammar rule searchLiteralList likely requires at least one element. The validation is a reasonable defensive check but has limited practical impact.

Low
Suggestions up to commit 98d35f4
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix grammar rule ordering for correct literal parsing

The grammar rule ordering matters for ANTLR parsing. fieldExpression likely matches
literalValue as well (since a literal can be a valid field expression), so the
startField = fieldExpression alternative may consume literal values before the
startValue = literalValue alternative is tried. The startValue = literalValue
alternative should be placed before startField = fieldExpression to ensure literals
are matched first.

ppl/src/main/antlr/OpenSearchPPLParser.g4 [663-667]

 startClause
    : START EQUAL LT_PRTHS searchLiteralList RT_PRTHS
+   | START EQUAL startValue = literalValue
    | START EQUAL startField = fieldExpression
-   | START EQUAL startValue = literalValue
    ;
Suggestion importance[1-10]: 7

__

Why: ANTLR uses ordered choice, so if fieldExpression can match literals (e.g., identifiers or quoted strings), the startField = fieldExpression alternative would consume literal values before startValue = literalValue is tried. Placing literalValue before fieldExpression ensures correct disambiguation.

Medium
General
Validate startFieldIndex is found in source schema

In literal start mode, sourceEnumerator is set to null, but startFieldIndex is
declared as private int (non-final after the PR change). The field is initialized to
0 by default (Java default for int), and only set to -1 in the literal start branch.
However, in moveNextNormalMode() and moveNextBatchMode(), startFieldIndex is used to
index into source rows. If somehow the wrong mode is entered, index 0 could be used
silently. The -1 assignment is correct, but the field should remain final or be
validated before use to prevent misuse.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java [211-220]

-if (graphLookup.getStartValues() != null) {
-    this.sourceEnumerator = null;
-    this.startFieldIndex = -1;
-  } else if (graphLookup.getSource() instanceof Scannable scannable) {
-    Enumerable<?> sourceEnum = scannable.scan();
-    this.sourceEnumerator = (Enumerator<@Nullable Object>) sourceEnum.enumerator();
-  } else {
+// In the constructor, after the null check:
+if (graphLookup.getStartValues() == null) {
+  List<String> sourceFields = graphLookup.getSource().getRowType().getFieldNames();
+  this.startFieldIndex = sourceFields.indexOf(graphLookup.getStartField());
+  if (this.startFieldIndex < 0) {
     throw new IllegalStateException(
-        "Source must be Scannable, got: " + graphLookup.getSource().getClass());
+        "Start field '" + graphLookup.getStartField() + "' not found in source schema: " + sourceFields);
   }
+}
Suggestion importance[1-10]: 5

__

Why: The improved_code restructures the startFieldIndex initialization with a validation check, which is a good defensive practice. If startField is not found in the source schema, indexOf returns -1, which would cause an ArrayIndexOutOfBoundsException at runtime rather than a clear error message.

Low
Validate that literal start values list is non-empty

builder.values(dummyType) with no rows creates an empty LogicalValues, but the
comment says it's a "dummy source". However, startValuesForCalcite is initialized as
an empty list and then populated, but if node.getStartValues() is an empty list (not
null), the BFS will have no seeds and produce no results without any error. Consider
adding a check that startValues is non-empty before proceeding.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2712-2725]

 if (node.getStartValues() != null) {
+  if (node.getStartValues().isEmpty()) {
+    throw new SemanticCheckException(
+        "graphLookup start literal list must contain at least one value.");
+  }
   // Literal start mode: create empty LogicalValues as dummy source (BiRel needs two inputs)
   RelDataType dummyType =
       builder
           .getTypeFactory()
           .createStructType(
               List.of(builder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR)),
               List.of("_dummy"));
   builder.values(dummyType);
Suggestion importance[1-10]: 4

__

Why: An empty startValues list (non-null) would silently produce no BFS results without any error. Adding an explicit check improves user experience by providing a clear error message, though the grammar makes this case unlikely in practice.

Low
Add validation for missing start clause alternatives

If neither startField, startValue, nor searchLiteralList is present, both startField
and startValues remain null, which could lead to a NullPointerException or silent
incorrect behavior downstream. An explicit validation should be added to throw a
meaningful error when none of the expected start clause alternatives are matched.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1575-1596]

 if (startCtx.startField != null) {
   // Piped mode: start=fieldExpression
   startField = (Field) internalVisitExpression(startCtx.startField);
 } else if (startCtx.startValue != null) {
   // Top-level mode: single literal e.g. start="Jack"
   startValues = List.of((Literal) internalVisitExpression(startCtx.startValue));
 } else if (startCtx.searchLiteralList() != null) {
+  // Top-level mode: literal list e.g. start=("Jack", "Eliot")
+  ...
+} else {
+  throw new SemanticCheckException(
+      "graphLookup requires a start clause: either a field reference or literal value(s).");
+}
Suggestion importance[1-10]: 3

__

Why: While the grammar enforces that one of the three alternatives must match, adding a defensive else-branch with a clear error message is a minor robustness improvement. In practice, ANTLR would fail to parse before reaching this code if none of the alternatives match.

Low
Suggestions up to commit dd282e5
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add fallback error handling for missing start clause

There is no validation that at least one of startField, startValue, or
searchLiteralList() is non-null. If the grammar somehow produces a startClause where
all three are null (e.g., due to a future grammar change or error recovery), both
startField and startValues will remain null, leading to a silent failure or a
NullPointerException downstream. Add an else branch to throw a descriptive
parse/semantic exception.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1575-1596]

 if (startCtx.startField != null) {
   // Piped mode: start=fieldExpression
   startField = (Field) internalVisitExpression(startCtx.startField);
 } else if (startCtx.startValue != null) {
   // Top-level mode: single literal e.g. start="Jack"
   startValues = List.of((Literal) internalVisitExpression(startCtx.startValue));
 } else if (startCtx.searchLiteralList() != null) {
+  // Top-level mode: literal list e.g. start=("Jack", "Eliot")
   ...
+} else {
+  throw new SemanticCheckException(
+      "graphLookup: 'start' clause must specify a field reference or literal value(s).");
 }
Suggestion importance[1-10]: 4

__

Why: The grammar enforces that one of the three alternatives must match, so in practice this else branch would never be reached. However, it's a reasonable defensive measure for future grammar changes. The improvement is minor since the grammar already constrains valid inputs.

Low
General
Validate that literal start values list is non-empty

If node.getStartValues() is a non-null but empty list (which is technically possible
if the grammar parses start=() or similar edge case), startValuesForCalcite will be
an empty list and performBfs will be called with no seeds, potentially causing
unexpected behavior or an empty BFS that silently returns no results. Add a
validation check to ensure startValues is non-empty before proceeding.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2712-2725]

 if (node.getStartValues() != null) {
+  if (node.getStartValues().isEmpty()) {
+    throw new SemanticCheckException(
+        "graphLookup: 'start' literal list must contain at least one value.");
+  }
   // Literal start mode: create empty LogicalValues as dummy source (BiRel needs two inputs)
   RelDataType dummyType =
       builder
           .getTypeFactory()
           .createStructType(
               List.of(builder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR)),
               List.of("_dummy"));
   builder.values(dummyType);
   startFieldName = null;
   startValuesForCalcite = new ArrayList<>();
   for (var lit : node.getStartValues()) {
     startValuesForCalcite.add(lit.getValue());
   }
 }
Suggestion importance[1-10]: 3

__

Why: While an empty startValues list is an edge case that the grammar likely prevents, adding this guard provides a clearer error message. The impact is low since the grammar's searchLiteralList rule would require at least one literal, making this scenario unlikely in practice.

Low
Clarify empty BFS result behavior in literal start mode

The reset() method resets literalStartCompleted to false, but current is also reset
to null. However, if reset() is called and then moveNext() is invoked again,
performBfs will be called a second time. This is correct behavior, but the BFS
result from the first call is discarded without any issue. The real concern is that
current() returns current[0] for literal start mode, but if moveNext() returns false
(after completion), current() should not be called — this is standard Enumerator
contract. The code is correct in this regard, but it's worth confirming that
bfsResults being an empty list (when no nodes are found) still produces a valid
single-row output rather than being treated as "no rows". The current implementation
always returns true once (even for empty BFS results), which correctly produces one
row with an empty array — this matches the test
testTopLevelGraphLookupNonExistentStart. No change needed here, but verify this is
the intended behavior per the spec.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java [280-292]

 private boolean moveNextLiteralStartMode() {
   if (literalStartCompleted) {
     return false;
   }
   literalStartCompleted = true;
 
   // Perform single BFS seeded with all literal start values
   List<Object> bfsResults = performBfs(graphLookup.getStartValues());
 
-  // Output single row: just the hierarchy array
+  // Output single row: just the hierarchy array (empty list if no results found)
   current = new Object[] {bfsResults};
   return true;
 }
Suggestion importance[1-10]: 1

__

Why: This suggestion only asks to verify existing behavior and the improved_code is essentially identical to the existing_code with just a comment change. It provides no actionable code improvement.

Low
Suggestions up to commit d97bd7b
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add error handling for unmatched start clause

If none of the grammar alternatives match (e.g., due to a parser bug or future
grammar change), both startField and startValues remain null, which will silently
produce an invalid GraphLookup node. Add an else branch that throws a descriptive
error to fail fast in this case.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1575-1596]

 if (startCtx.startField != null) {
   // Piped mode: start=fieldExpression
   startField = (Field) internalVisitExpression(startCtx.startField);
 } else if (startCtx.startValue != null) {
   // Top-level mode: single literal e.g. start="Jack"
   startValues = List.of((Literal) internalVisitExpression(startCtx.startValue));
 } else if (startCtx.searchLiteralList() != null) {
   ...
+} else {
+  throw new SemanticCheckException(
+      "Invalid start clause: expected a field reference or literal value(s).");
 }
-// No else: if none match, both startField and startValues remain null
Suggestion importance[1-10]: 5

__

Why: The suggestion is valid - if none of the grammar alternatives match, both startField and startValues remain null, which could produce an invalid GraphLookup node. However, since the grammar is well-defined with exactly three alternatives, this case should be unreachable in practice, making this a defensive coding improvement rather than a critical fix.

Low
General
Initialize sentinel field value at declaration

startFieldIndex is declared as a non-final instance field and initialized to -1 for
literal start mode or computed from source fields for piped mode. However, since it
is only meaningful in piped mode, leaving it as a mutable field without a clear
sentinel check could lead to accidental use of the uninitialized value (0 by default
before the constructor sets it). The field should be initialized to -1 at
declaration to make the sentinel explicit and safe.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java [186]

-private int startFieldIndex;
+private int startFieldIndex = -1;
Suggestion importance[1-10]: 4

__

Why: The field startFieldIndex is already set to -1 in the constructor for literal start mode (line 213), so the default Java value of 0 is only present briefly before the constructor runs. Initializing at declaration makes the sentinel explicit and improves readability, but is a minor defensive improvement.

Low
Validate non-empty literal start values list

When node.getStartValues() is not null but is an empty list, startValuesForCalcite
will be an empty list, which will cause the BFS to be seeded with no values and
silently return empty results. This edge case should be validated early with a clear
error message rather than silently producing no output.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2712-2725]

 if (node.getStartValues() != null) {
+  if (node.getStartValues().isEmpty()) {
+    throw new SemanticCheckException(
+        "Literal start values list cannot be empty. Provide at least one start value.");
+  }
   // Literal start mode: create empty LogicalValues as dummy source (BiRel needs two inputs)
   RelDataType dummyType =
       builder
           .getTypeFactory()
           .createStructType(
               List.of(builder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR)),
               List.of("_dummy"));
   builder.values(dummyType);
   startFieldName = null;
   startValuesForCalcite = new ArrayList<>();
   for (var lit : node.getStartValues()) {
     startValuesForCalcite.add(lit.getValue());
   }
 } else {
Suggestion importance[1-10]: 4

__

Why: An empty startValues list is technically possible if the grammar allows it, and would silently produce empty results. Adding an early validation with a clear error message improves user experience, though the grammar's searchLiteralList rule likely requires at least one element, making this an edge case defensive check.

Low

…oject#5243)

Add support for graphLookup as the first command in a PPL query with
literal start values, instead of requiring piped input from source=.

Syntax:
  graphLookup table start="value" edge=from-->to as output
  graphLookup table start=("v1", "v2") edge=from-->to as output

Signed-off-by: Heng Qian <qianheng@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit dd282e5

Signed-off-by: Heng Qian <qianheng@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 98d35f4

@LantaoJin
Copy link
Member

LantaoJin commented Mar 23, 2026

In this mode, the startWith option accepts either a single literal value or a list of literal values

Q: Why only top-level command could accept literal value?
Couldn't be source=employees | graphLookup employees start=("Jack", "Eliot") edge=reportsTo-->name as reportingHierarchy?

Signed-off-by: Heng Qian <qianheng@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit c9258d9

@srikanthpadakanti
Copy link
Contributor

Hello @qianheng-aws
I noticed a few items from the PPL Command Development Checklist that are still pending:
Explain test
V2 unsupported test
docs/category.json and graphlookup.md

Were these intentionally left out?

@qianheng-aws
Copy link
Collaborator Author

Hello @qianheng-aws I noticed a few items from the PPL Command Development Checklist that are still pending: Explain test V2 unsupported test docs/category.json and graphlookup.md

Were these intentionally left out?

No, the agent didn't notice the checklist. Will call out this in the CLAUDE.md. Plan to init a shared CLAUDE.md for this repo #5242


startClause
: START EQUAL startField = fieldExpression
: START EQUAL LT_PRTHS searchLiteralList RT_PRTHS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove LT_PRTHS and RT_PRTHS. Comma-list is enough

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IN phrase has PRTH as well. Shall then be consistent?

@qianheng-aws qianheng-aws changed the title [Feature] Support graphLookup as top-level PPL command [Feature] Support graphLookup with literal value as its start Mar 24, 2026
- Add explain plan tests in CalciteExplainIT with YAML assertions
- Add v2-unsupported tests in NewAddedCommandsIT
- Add CalcitePPLGraphLookupIT to CalciteNoPushdownIT suite
- Skip graphLookup tests when pushdown is disabled (required by impl)
- Add expected plan YAML files for piped and top-level graphLookup

Signed-off-by: Heng Qian <qianheng@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 496a851

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support graphlookup as a top-level PPL command

3 participants