[Feature] Support graphLookup with literal value as its start by qianheng-aws · Pull Request #5253 · opensearch-project/sql

qianheng-aws · 2026-03-23T07:29:56Z

Description

Adds support for graphlookup as a top-level PPL command (i.e., as the first command in a query without requiring source=). In this mode, the startWith option accepts either a single literal value or a list of literal values to seed the graph traversal. It will get simpler on syntax and also more execution efficiency for there common graph search cases where the start value is already know as a literal.

Syntax

graphLookup <lookupIndex> start=<literalValue> edge=<fromField><operator><toField> [maxDepth=<maxDepth>] [depthField=<depthField>] [usePIT=(true | false)] [filter=(<condition>)] as <outputField>
graphLookup <lookupIndex> start=(<literalValue1>, <literalValue2>, ...) edge=<fromField><operator><toField> [maxDepth=<maxDepth>] [depthField=<depthField>] [usePIT=(true | false)] [filter=(<condition>)] as <outputField>

PPL Examples:

graphLookup employees start="Eliot" edge=reportsTo-->name as reportingHierarchy
graphLookup employees start=("Jack", "Eliot") edge=reportsTo-->name as reportingHierarchy

Response examples:

{
    "schema": [
        {
            "name": "reportingHierarchy",
            "type": "array"
        }
    ],
    "datarows": [
        [
            [
                {
                    "name": "Eliot",
                    "reportsTo": "Ron",
                    "id": 2
                },
                {
                    "name": "Ron",
                    "reportsTo": "Andrew",
                    "id": 3
                },
                {
                    "name": "Andrew",
                    "id": 4
                }
            ]
        ]
    ],
    "total": 1,
    "size": 1
}

Related Issues

Resolves #5243

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-03-23T07:31:29Z

PR Reviewer Guide 🔍

(Review updated until commit `496a851`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java [2722]: Limit the number of source rows to 100 for now, make it configurable.
🔀 Multiple PR themes Sub-PR theme: Core GraphLookup AST and Calcite Plan Updates for Literal Start Support Relevant files: core/src/main/java/org/opensearch/sql/ast/tree/GraphLookup.java core/src/main/java/org/opensearch/sql/calcite/plan/rel/GraphLookup.java core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalGraphLookup.java opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableGraphLookupRule.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java Sub-PR theme: PPL Parser and Grammar Updates for Top-Level GraphLookup Relevant files: ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java ppl/src/main/antlr/OpenSearchPPLParser.g4 ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java Sub-PR theme: CalciteRelNodeVisitor GraphLookup Visitor Logic Relevant files: core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java Sub-PR theme: Test Cases and Documentation for Top-Level GraphLookup Feature Relevant files: integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java integ-test/src/test/resources/expectedOutput/calcite/explain_graphlookup.yaml integ-test/src/test/resources/expectedOutput/calcite/explain_graphlookup_top_level.yaml Sub-PR theme: Documentation Updates for GraphLookup Command Relevant files: docs/user/ppl/cmd/graphlookup.md
⚡ Recommended focus areas for review Null Pointer Risk In literal start mode, `sourceEnumerator` is set to null (line 212), but later code at line 249 attempts to close it without null-checking. While there is a null-check added at line 249, the original code at line 249 in the catch block did not have this protection. The fix is present, but the pattern of conditionally initializing `sourceEnumerator` to null and then using it throughout the class creates fragility. If future code paths forget the null-check, NPE will occur. // Get the source enumerator (null for literal start mode) if (graphLookup.getStartValues() != null) { this.sourceEnumerator = null; this.startFieldIndex = -1; } else if (graphLookup.getSource() instanceof Scannable scannable) { Enumerable<?> sourceEnum = scannable.scan(); this.sourceEnumerator = (Enumerator<@Nullable Object>) sourceEnum.enumerator(); } else { throw new IllegalStateException( "Source must be Scannable, got: " + graphLookup.getSource().getClass()); } Incomplete Error Handling When `node.getStartValues()` is not null (literal start mode), the code creates a dummy LogicalValues source and sets `startFieldName = null`. However, if `node.getChild()` is not empty in this mode, the piped source is silently ignored without warning. This could mask user errors where a user accidentally provides both a piped source and literal start values. The parser should validate this mutual exclusivity earlier, or at minimum log a warning. if (node.getStartValues() != null) { // Literal start mode: create empty LogicalValues as dummy source (BiRel needs two inputs) // And will ignore the previous pipe then. RelDataType dummyType = builder .getTypeFactory() .createStructType( List.of(builder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR)), List.of("_dummy")); builder.values(dummyType); startFieldName = null; startValuesForCalcite = new ArrayList<>(); for (var lit : node.getStartValues()) { startValuesForCalcite.add(lit.getValue()); } } else { if (node.getChild().isEmpty()) { throw new SemanticCheckException( "Field reference start requires a piped source." + " Use literal start values (e.g. start='value') for top-level graphLookup."); } // Piped mode: visit source child visitChildren(node, context); // TODO: Limit the number of source rows to 100 for now, make it configurable. builder.limit(0, 100); if (node.isBatchMode()) { tryToRemoveMetaFields(context, true); } startFieldName = node.getStartField().getField().toString(); }

github-actions · 2026-03-23T07:32:03Z

PR Code Suggestions ✨

Latest suggestions up to 496a851

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Validate field indices are found The code uses `indexOf()` to find field indices but does not validate that the fields exist (indexOf returns -1 if not found). This could cause silent failures or incorrect behavior during BFS traversal. Add validation to ensure all required fields are present in their respective row types. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java [223-230] try { this.lookupFields = graphLookup.getLookup().getRowType().getFieldNames(); this.fromFieldIdx = lookupFields.indexOf(graphLookup.fromField); this.toFieldIdx = lookupFields.indexOf(graphLookup.toField); + + if (this.fromFieldIdx < 0) { + throw new IllegalArgumentException("fromField '" + graphLookup.fromField + "' not found in lookup table"); + } + if (this.toFieldIdx < 0) { + throw new IllegalArgumentException("toField '" + graphLookup.toField + "' not found in lookup table"); + } if (graphLookup.getStartValues() == null) { List<String> sourceFields = graphLookup.getSource().getRowType().getFieldNames(); this.startFieldIndex = sourceFields.indexOf(graphLookup.getStartField()); + if (this.startFieldIndex < 0) { + throw new IllegalArgumentException("startField '" + graphLookup.getStartField() + "' not found in source table"); + } } Suggestion importance[1-10]: 8 __ Why: The suggestion adds critical validation that field indices are found (not -1) before using them in BFS traversal. This prevents silent failures and incorrect behavior during graph traversal. The validation is important for correctness and provides clear error messages when fields are missing, making it a valuable improvement to the PR code.	Medium
	Add error handling for invalid start syntax The code handles three cases but lacks an else clause to detect invalid or missing start clause syntax. If none of the conditions match, both `startField` and `startValues` remain null, which could cause downstream NullPointerExceptions. Add an else clause to throw a clear error for malformed syntax. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1575-1596] if (startCtx.startField != null) { // Piped mode: start=fieldExpression startField = (Field) internalVisitExpression(startCtx.startField); } else if (startCtx.startValue != null) { // Top-level mode: single literal e.g. start="Jack" startValues = List.of((Literal) internalVisitExpression(startCtx.startValue)); } else if (startCtx.searchLiteralList() != null) { // Top-level mode: literal list e.g. start=("Jack", "Eliot") OpenSearchPPLParser.SearchLiteralsContext listCtx = (OpenSearchPPLParser.SearchLiteralsContext) startCtx.searchLiteralList(); startValues = new ArrayList<>(); for (OpenSearchPPLParser.SearchLiteralContext lit : listCtx.searchLiteral()) { if (lit.stringLiteral() != null) { startValues.add((Literal) internalVisitExpression(lit.stringLiteral())); } else if (lit.numericLiteral() != null) { startValues.add((Literal) internalVisitExpression(lit.numericLiteral())); } else { // ID, NUMERIC_ID, searchableKeyWord — treat as string startValues.add(new Literal(lit.getText(), DataType.STRING)); } } +} else { + throw new IllegalArgumentException("Invalid start clause: must specify field, literal value, or literal list"); } Suggestion importance[1-10]: 7 __ Why: The suggestion adds an else clause to catch malformed start clause syntax, which is valuable error handling. This prevents silent failures where both `startField` and `startValues` remain null. The improvement is meaningful for robustness, though the parser grammar should ideally prevent such cases from occurring in the first place.	Medium
	Add null and empty check before BFS The `performBfs` method is called with `graphLookup.getStartValues()` but there is no null check or validation that the list is non-empty. If `performBfs` expects a non-empty collection, this could cause unexpected behavior. Add defensive validation before calling `performBfs`. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java [280-292] private boolean moveNextLiteralStartMode() { if (literalStartCompleted) { return false; } literalStartCompleted = true; // Perform single BFS seeded with all literal start values - List<Object> bfsResults = performBfs(graphLookup.getStartValues()); + List<Object> startValues = graphLookup.getStartValues(); + if (startValues == null \|\| startValues.isEmpty()) { + current = new Object[] {Collections.emptyList()}; + return true; + } + List<Object> bfsResults = performBfs(startValues); // Output single row: just the hierarchy array current = new Object[] {bfsResults}; return true; } Suggestion importance[1-10]: 6 __ Why: The suggestion adds defensive null and empty checks before calling `performBfs()`, which is reasonable error handling. However, the PR code already ensures `startValues` is non-null before entering literal start mode (checked at line 211), making this check somewhat redundant. The empty list check adds value but is a minor defensive improvement rather than fixing a critical issue.	Low
	Validate start values and field nullability The code should validate that `startValuesForCalcite` is not empty after extraction to prevent silent failures with empty literal lists. Additionally, consider adding a null check before calling `node.getStartField().getField()` in piped mode to provide clearer error messages if the field is unexpectedly null. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2697-2728] List<Object> startValuesForCalcite = null; String startFieldName; if (node.getStartValues() != null) { // Literal start mode: create empty LogicalValues as dummy source (BiRel needs two inputs) // And will ignore the previous pipe then. RelDataType dummyType = builder .getTypeFactory() .createStructType( List.of(builder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR)), List.of("_dummy")); builder.values(dummyType); startFieldName = null; startValuesForCalcite = new ArrayList<>(); for (var lit : node.getStartValues()) { startValuesForCalcite.add(lit.getValue()); } + if (startValuesForCalcite.isEmpty()) { + throw new SemanticCheckException("Start values list cannot be empty for graphLookup."); + } } else { if (node.getChild().isEmpty()) { throw new SemanticCheckException( "Field reference start requires a piped source." + " Use literal start values (e.g. start='value') for top-level graphLookup."); } // Piped mode: visit source child visitChildren(node, context); // TODO: Limit the number of source rows to 100 for now, make it configurable. builder.limit(0, 100); if (node.isBatchMode()) { tryToRemoveMetaFields(context, true); } + if (node.getStartField() == null) { + throw new SemanticCheckException("Start field cannot be null in piped mode."); + } startFieldName = node.getStartField().getField().toString(); } Suggestion importance[1-10]: 5 __ Why: The suggestion adds defensive validation for empty `startValuesForCalcite` and null `startField`, which could prevent runtime errors. However, the empty list check may be overly strict since the parser should already validate syntax, and the null check for `startField` in piped mode is reasonable but represents a defensive measure rather than addressing a critical bug in the PR code.	Low

Previous suggestions

Suggestions up to commit c9258d9

Category	Suggestion	Impact
Possible issue	Guard against null dereference in current() `current()` is called before `moveNext()` returns `true` in some Calcite enumerator patterns, and `current` is initialized to `null`. Accessing `current[0]` when `current` is `null` will throw a `NullPointerException`. Add a null guard consistent with how the rest of the enumerator handles uninitialized state. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java [257-264] @Override public Object current() { // Literal start mode: single column output, Calcite expects scalar value if (graphLookup.getStartValues() != null) { - return current[0]; + return current != null ? current[0] : null; } // source fields + output array (normal mode) or [source array, lookup array] (batch mode) return current; } Suggestion importance[1-10]: 5 __ Why: The `current` field is initialized to `null` and `current()` could be called before `moveNext()` in some patterns. Accessing `current[0]` when `current` is `null` would throw a `NullPointerException`. The null guard is a valid defensive improvement, though Calcite's contract typically requires `moveNext()` to be called first.	Low
Possible issue	Add error handling for unrecognized start clause If none of the three branches match (e.g., the grammar produces an unexpected parse tree), both `startField` and `startValues` remain `null`, which will silently produce a broken `GraphLookup` node. Add an `else` branch that throws a descriptive `SemanticCheckException` to fail fast with a clear error message. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1575-1596] if (startCtx.startField != null) { // Piped mode: start=fieldExpression startField = (Field) internalVisitExpression(startCtx.startField); } else if (startCtx.startValue != null) { // Top-level mode: single literal e.g. start="Jack" startValues = List.of((Literal) internalVisitExpression(startCtx.startValue)); } else if (startCtx.searchLiteralList() != null) { - ... + // Top-level mode: literal list e.g. start=("Jack", "Eliot") + OpenSearchPPLParser.SearchLiteralsContext listCtx = + (OpenSearchPPLParser.SearchLiteralsContext) startCtx.searchLiteralList(); + startValues = new ArrayList<>(); + for (OpenSearchPPLParser.SearchLiteralContext lit : listCtx.searchLiteral()) { + if (lit.stringLiteral() != null) { + startValues.add((Literal) internalVisitExpression(lit.stringLiteral())); + } else if (lit.numericLiteral() != null) { + startValues.add((Literal) internalVisitExpression(lit.numericLiteral())); + } else { + startValues.add(new Literal(lit.getText(), DataType.STRING)); + } + } +} else { + throw new SemanticCheckException( + "graphLookup: start clause must specify a field reference or literal value(s)."); } Suggestion importance[1-10]: 4 __ Why: While adding an `else` branch for unrecognized start clauses is a reasonable defensive measure, the grammar in `OpenSearchPPLParser.g4` already constrains the valid alternatives, making this case practically unreachable. The suggestion is valid but has low practical impact.	Low
General	Validate literal start list is non-empty After parsing the literal list, `startValues` could be empty if `searchLiteralList` matched but contained no literals. An empty `startValues` list would cause a BFS with no seeds, producing a confusing empty result. Add a validation check to throw a `SemanticCheckException` when the list is empty. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1581-1596] } else if (startCtx.searchLiteralList() != null) { - // Top-level mode: literal list e.g. start=("Jack", "Eliot") OpenSearchPPLParser.SearchLiteralsContext listCtx = (OpenSearchPPLParser.SearchLiteralsContext) startCtx.searchLiteralList(); startValues = new ArrayList<>(); for (OpenSearchPPLParser.SearchLiteralContext lit : listCtx.searchLiteral()) { - ... + if (lit.stringLiteral() != null) { + startValues.add((Literal) internalVisitExpression(lit.stringLiteral())); + } else if (lit.numericLiteral() != null) { + startValues.add((Literal) internalVisitExpression(lit.numericLiteral())); + } else { + startValues.add(new Literal(lit.getText(), DataType.STRING)); + } + } + if (startValues.isEmpty()) { + throw new SemanticCheckException( + "graphLookup: start literal list must contain at least one value."); } } Suggestion importance[1-10]: 4 __ Why: An empty `startValues` list after parsing `searchLiteralList` is theoretically possible if the grammar allows it, but the grammar rule `searchLiteralList` likely requires at least one element. The validation is a reasonable defensive check but has limited practical impact.	Low

Suggestions up to commit 98d35f4

Category	Suggestion	Impact
Possible issue	Fix grammar rule ordering for correct literal parsing The grammar rule ordering matters for ANTLR parsing. `fieldExpression` likely matches `literalValue` as well (since a literal can be a valid field expression), so the `startField = fieldExpression` alternative may consume literal values before the `startValue = literalValue` alternative is tried. The `startValue = literalValue` alternative should be placed before `startField = fieldExpression` to ensure literals are matched first. ppl/src/main/antlr/OpenSearchPPLParser.g4 [663-667] startClause : START EQUAL LT_PRTHS searchLiteralList RT_PRTHS + \| START EQUAL startValue = literalValue \| START EQUAL startField = fieldExpression - \| START EQUAL startValue = literalValue ; Suggestion importance[1-10]: 7 __ Why: ANTLR uses ordered choice, so if `fieldExpression` can match literals (e.g., identifiers or quoted strings), the `startField = fieldExpression` alternative would consume literal values before `startValue = literalValue` is tried. Placing `literalValue` before `fieldExpression` ensures correct disambiguation.	Medium
General	Validate startFieldIndex is found in source schema In literal start mode, `sourceEnumerator` is set to `null`, but `startFieldIndex` is declared as `private int` (non-final after the PR change). The field is initialized to `0` by default (Java default for `int`), and only set to `-1` in the literal start branch. However, in `moveNextNormalMode()` and `moveNextBatchMode()`, `startFieldIndex` is used to index into source rows. If somehow the wrong mode is entered, index `0` could be used silently. The `-1` assignment is correct, but the field should remain `final` or be validated before use to prevent misuse. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java [211-220] -if (graphLookup.getStartValues() != null) { - this.sourceEnumerator = null; - this.startFieldIndex = -1; - } else if (graphLookup.getSource() instanceof Scannable scannable) { - Enumerable<?> sourceEnum = scannable.scan(); - this.sourceEnumerator = (Enumerator<@Nullable Object>) sourceEnum.enumerator(); - } else { +// In the constructor, after the null check: +if (graphLookup.getStartValues() == null) { + List<String> sourceFields = graphLookup.getSource().getRowType().getFieldNames(); + this.startFieldIndex = sourceFields.indexOf(graphLookup.getStartField()); + if (this.startFieldIndex < 0) { throw new IllegalStateException( - "Source must be Scannable, got: " + graphLookup.getSource().getClass()); + "Start field '" + graphLookup.getStartField() + "' not found in source schema: " + sourceFields); } +} Suggestion importance[1-10]: 5 __ Why: The `improved_code` restructures the `startFieldIndex` initialization with a validation check, which is a good defensive practice. If `startField` is not found in the source schema, `indexOf` returns `-1`, which would cause an `ArrayIndexOutOfBoundsException` at runtime rather than a clear error message.	Low
	Validate that literal start values list is non-empty `builder.values(dummyType)` with no rows creates an empty `LogicalValues`, but the comment says it's a "dummy source". However, `startValuesForCalcite` is initialized as an empty list and then populated, but if `node.getStartValues()` is an empty list (not null), the BFS will have no seeds and produce no results without any error. Consider adding a check that `startValues` is non-empty before proceeding. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2712-2725] if (node.getStartValues() != null) { + if (node.getStartValues().isEmpty()) { + throw new SemanticCheckException( + "graphLookup start literal list must contain at least one value."); + } // Literal start mode: create empty LogicalValues as dummy source (BiRel needs two inputs) RelDataType dummyType = builder .getTypeFactory() .createStructType( List.of(builder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR)), List.of("_dummy")); builder.values(dummyType); Suggestion importance[1-10]: 4 __ Why: An empty `startValues` list (non-null) would silently produce no BFS results without any error. Adding an explicit check improves user experience by providing a clear error message, though the grammar makes this case unlikely in practice.	Low
	Add validation for missing start clause alternatives If neither `startField`, `startValue`, nor `searchLiteralList` is present, both `startField` and `startValues` remain `null`, which could lead to a `NullPointerException` or silent incorrect behavior downstream. An explicit validation should be added to throw a meaningful error when none of the expected start clause alternatives are matched. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1575-1596] if (startCtx.startField != null) { // Piped mode: start=fieldExpression startField = (Field) internalVisitExpression(startCtx.startField); } else if (startCtx.startValue != null) { // Top-level mode: single literal e.g. start="Jack" startValues = List.of((Literal) internalVisitExpression(startCtx.startValue)); } else if (startCtx.searchLiteralList() != null) { + // Top-level mode: literal list e.g. start=("Jack", "Eliot") + ... +} else { + throw new SemanticCheckException( + "graphLookup requires a start clause: either a field reference or literal value(s)."); +} Suggestion importance[1-10]: 3 __ Why: While the grammar enforces that one of the three alternatives must match, adding a defensive else-branch with a clear error message is a minor robustness improvement. In practice, ANTLR would fail to parse before reaching this code if none of the alternatives match.	Low

Suggestions up to commit dd282e5

Category	Suggestion	Impact
Possible issue	Add fallback error handling for missing start clause There is no validation that at least one of `startField`, `startValue`, or `searchLiteralList()` is non-null. If the grammar somehow produces a `startClause` where all three are null (e.g., due to a future grammar change or error recovery), both `startField` and `startValues` will remain null, leading to a silent failure or a NullPointerException downstream. Add an `else` branch to throw a descriptive parse/semantic exception. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1575-1596] if (startCtx.startField != null) { // Piped mode: start=fieldExpression startField = (Field) internalVisitExpression(startCtx.startField); } else if (startCtx.startValue != null) { // Top-level mode: single literal e.g. start="Jack" startValues = List.of((Literal) internalVisitExpression(startCtx.startValue)); } else if (startCtx.searchLiteralList() != null) { + // Top-level mode: literal list e.g. start=("Jack", "Eliot") ... +} else { + throw new SemanticCheckException( + "graphLookup: 'start' clause must specify a field reference or literal value(s)."); } Suggestion importance[1-10]: 4 __ Why: The grammar enforces that one of the three alternatives must match, so in practice this else branch would never be reached. However, it's a reasonable defensive measure for future grammar changes. The improvement is minor since the grammar already constrains valid inputs.	Low
General	Validate that literal start values list is non-empty If `node.getStartValues()` is a non-null but empty list (which is technically possible if the grammar parses `start=()` or similar edge case), `startValuesForCalcite` will be an empty list and `performBfs` will be called with no seeds, potentially causing unexpected behavior or an empty BFS that silently returns no results. Add a validation check to ensure `startValues` is non-empty before proceeding. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2712-2725] if (node.getStartValues() != null) { + if (node.getStartValues().isEmpty()) { + throw new SemanticCheckException( + "graphLookup: 'start' literal list must contain at least one value."); + } // Literal start mode: create empty LogicalValues as dummy source (BiRel needs two inputs) RelDataType dummyType = builder .getTypeFactory() .createStructType( List.of(builder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR)), List.of("_dummy")); builder.values(dummyType); startFieldName = null; startValuesForCalcite = new ArrayList<>(); for (var lit : node.getStartValues()) { startValuesForCalcite.add(lit.getValue()); } } Suggestion importance[1-10]: 3 __ Why: While an empty `startValues` list is an edge case that the grammar likely prevents, adding this guard provides a clearer error message. The impact is low since the grammar's `searchLiteralList` rule would require at least one literal, making this scenario unlikely in practice.	Low
General	Clarify empty BFS result behavior in literal start mode The `reset()` method resets `literalStartCompleted` to `false`, but `current` is also reset to `null`. However, if `reset()` is called and then `moveNext()` is invoked again, `performBfs` will be called a second time. This is correct behavior, but the BFS result from the first call is discarded without any issue. The real concern is that `current()` returns `current[0]` for literal start mode, but if `moveNext()` returns `false` (after completion), `current()` should not be called — this is standard Enumerator contract. The code is correct in this regard, but it's worth confirming that `bfsResults` being an empty list (when no nodes are found) still produces a valid single-row output rather than being treated as "no rows". The current implementation always returns `true` once (even for empty BFS results), which correctly produces one row with an empty array — this matches the test `testTopLevelGraphLookupNonExistentStart`. No change needed here, but verify this is the intended behavior per the spec. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java [280-292] private boolean moveNextLiteralStartMode() { if (literalStartCompleted) { return false; } literalStartCompleted = true; // Perform single BFS seeded with all literal start values List<Object> bfsResults = performBfs(graphLookup.getStartValues()); - // Output single row: just the hierarchy array + // Output single row: just the hierarchy array (empty list if no results found) current = new Object[] {bfsResults}; return true; } Suggestion importance[1-10]: 1 __ Why: This suggestion only asks to verify existing behavior and the `improved_code` is essentially identical to the `existing_code` with just a comment change. It provides no actionable code improvement.	Low

Suggestions up to commit d97bd7b

Category	Suggestion	Impact
Possible issue	Add error handling for unmatched start clause If none of the grammar alternatives match (e.g., due to a parser bug or future grammar change), both `startField` and `startValues` remain `null`, which will silently produce an invalid `GraphLookup` node. Add an `else` branch that throws a descriptive error to fail fast in this case. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1575-1596] if (startCtx.startField != null) { // Piped mode: start=fieldExpression startField = (Field) internalVisitExpression(startCtx.startField); } else if (startCtx.startValue != null) { // Top-level mode: single literal e.g. start="Jack" startValues = List.of((Literal) internalVisitExpression(startCtx.startValue)); } else if (startCtx.searchLiteralList() != null) { ... +} else { + throw new SemanticCheckException( + "Invalid start clause: expected a field reference or literal value(s)."); } -// No else: if none match, both startField and startValues remain null Suggestion importance[1-10]: 5 __ Why: The suggestion is valid - if none of the grammar alternatives match, both `startField` and `startValues` remain `null`, which could produce an invalid `GraphLookup` node. However, since the grammar is well-defined with exactly three alternatives, this case should be unreachable in practice, making this a defensive coding improvement rather than a critical fix.	Low
General	Initialize sentinel field value at declaration `startFieldIndex` is declared as a non-final instance field and initialized to `-1` for literal start mode or computed from source fields for piped mode. However, since it is only meaningful in piped mode, leaving it as a mutable field without a clear sentinel check could lead to accidental use of the uninitialized value (`0` by default before the constructor sets it). The field should be initialized to `-1` at declaration to make the sentinel explicit and safe. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java [186] -private int startFieldIndex; +private int startFieldIndex = -1; Suggestion importance[1-10]: 4 __ Why: The field `startFieldIndex` is already set to `-1` in the constructor for literal start mode (line 213), so the default Java value of `0` is only present briefly before the constructor runs. Initializing at declaration makes the sentinel explicit and improves readability, but is a minor defensive improvement.	Low
General	Validate non-empty literal start values list When `node.getStartValues()` is not null but is an empty list, `startValuesForCalcite` will be an empty list, which will cause the BFS to be seeded with no values and silently return empty results. This edge case should be validated early with a clear error message rather than silently producing no output. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2712-2725] if (node.getStartValues() != null) { + if (node.getStartValues().isEmpty()) { + throw new SemanticCheckException( + "Literal start values list cannot be empty. Provide at least one start value."); + } // Literal start mode: create empty LogicalValues as dummy source (BiRel needs two inputs) RelDataType dummyType = builder .getTypeFactory() .createStructType( List.of(builder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR)), List.of("_dummy")); builder.values(dummyType); startFieldName = null; startValuesForCalcite = new ArrayList<>(); for (var lit : node.getStartValues()) { startValuesForCalcite.add(lit.getValue()); } } else { Suggestion importance[1-10]: 4 __ Why: An empty `startValues` list is technically possible if the grammar allows it, and would silently produce empty results. Adding an early validation with a clear error message improves user experience, though the grammar's `searchLiteralList` rule likely requires at least one element, making this an edge case defensive check.	Low

…oject#5243) Add support for graphLookup as the first command in a PPL query with literal start values, instead of requiring piped input from source=. Syntax: graphLookup table start="value" edge=from-->to as output graphLookup table start=("v1", "v2") edge=from-->to as output Signed-off-by: Heng Qian <qianheng@amazon.com>

github-actions · 2026-03-23T07:34:02Z

Persistent review updated to latest commit dd282e5

Signed-off-by: Heng Qian <qianheng@amazon.com>

github-actions · 2026-03-23T07:43:13Z

Persistent review updated to latest commit 98d35f4

LantaoJin · 2026-03-23T08:24:23Z

In this mode, the startWith option accepts either a single literal value or a list of literal values

Q: Why only top-level command could accept literal value?
Couldn't be source=employees | graphLookup employees start=("Jack", "Eliot") edge=reportsTo-->name as reportingHierarchy?

docs/user/ppl/cmd/graphlookup.md

Signed-off-by: Heng Qian <qianheng@amazon.com>

github-actions · 2026-03-23T08:59:41Z

Persistent review updated to latest commit c9258d9

srikanthpadakanti · 2026-03-23T14:40:11Z

Hello @qianheng-aws
I noticed a few items from the PPL Command Development Checklist that are still pending:
Explain test
V2 unsupported test
docs/category.json and graphlookup.md

Were these intentionally left out?

qianheng-aws · 2026-03-24T02:52:39Z

Hello @qianheng-aws I noticed a few items from the PPL Command Development Checklist that are still pending: Explain test V2 unsupported test docs/category.json and graphlookup.md

Were these intentionally left out?

No, the agent didn't notice the checklist. Will call out this in the CLAUDE.md. Plan to init a shared CLAUDE.md for this repo #5242

docs/user/ppl/cmd/graphlookup.md

LantaoJin · 2026-03-24T03:02:53Z

ppl/src/main/antlr/OpenSearchPPLParser.g4


 startClause
-   : START EQUAL startField = fieldExpression
+   : START EQUAL LT_PRTHS searchLiteralList RT_PRTHS


remove LT_PRTHS and RT_PRTHS. Comma-list is enough

IN phrase has PRTH as well. Shall then be consistent?

- Add explain plan tests in CalciteExplainIT with YAML assertions - Add v2-unsupported tests in NewAddedCommandsIT - Add CalcitePPLGraphLookupIT to CalciteNoPushdownIT suite - Skip graphLookup tests when pushdown is disabled (required by impl) - Add expected plan YAML files for piped and top-level graphLookup Signed-off-by: Heng Qian <qianheng@amazon.com>

github-actions · 2026-03-24T06:26:56Z

Persistent review updated to latest commit 496a851

qianheng-aws added the enhancement New feature or request label Mar 23, 2026

qianheng-aws force-pushed the graphEnhance2 branch from d97bd7b to dd282e5 Compare March 23, 2026 07:32

Spotless check

98d35f4

Signed-off-by: Heng Qian <qianheng@amazon.com>

LantaoJin reviewed Mar 23, 2026

View reviewed changes

docs/user/ppl/cmd/graphlookup.md Show resolved Hide resolved

Ignore child pipe if using start value

c9258d9

Signed-off-by: Heng Qian <qianheng@amazon.com>

LantaoJin reviewed Mar 24, 2026

View reviewed changes

docs/user/ppl/cmd/graphlookup.md Outdated Show resolved Hide resolved

docs/user/ppl/cmd/graphlookup.md Outdated Show resolved Hide resolved

LantaoJin reviewed Mar 24, 2026

View reviewed changes

qianheng-aws changed the title ~~[Feature] Support graphLookup as top-level PPL command~~ [Feature] Support graphLookup with literal value as its start Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support graphLookup with literal value as its start#5253

[Feature] Support graphLookup with literal value as its start#5253
qianheng-aws wants to merge 4 commits intoopensearch-project:mainfrom
qianheng-aws:graphEnhance2

qianheng-aws commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

LantaoJin commented Mar 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

srikanthpadakanti commented Mar 23, 2026

Uh oh!

qianheng-aws commented Mar 24, 2026

Uh oh!

Uh oh!

Uh oh!

LantaoJin Mar 24, 2026

Uh oh!

qianheng-aws Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qianheng-aws commented Mar 23, 2026

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 496a851)

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

LantaoJin commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

srikanthpadakanti commented Mar 23, 2026

Uh oh!

qianheng-aws commented Mar 24, 2026

Uh oh!

Uh oh!

Uh oh!

LantaoJin Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

qianheng-aws Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 23, 2026 •

edited

Loading

(Review updated until commit `496a851`)

github-actions bot commented Mar 23, 2026 •

edited

Loading

LantaoJin commented Mar 23, 2026 •

edited

Loading