Skip to content

Add AnalyticsExecutionEngine and query routing for Parquet-backed indices#5258

Draft
ahkcs wants to merge 4 commits intoopensearch-project:feature/mustang-ppl-integrationfrom
ahkcs:feature/mustang-analytics-execution-engine
Draft

Add AnalyticsExecutionEngine and query routing for Parquet-backed indices#5258
ahkcs wants to merge 4 commits intoopensearch-project:feature/mustang-ppl-integrationfrom
ahkcs:feature/mustang-analytics-execution-engine

Conversation

@ahkcs
Copy link
Collaborator

@ahkcs ahkcs commented Mar 23, 2026

Summary

Implements the execution engine adapter and query routing infrastructure for Project Mustang's unified query pipeline (#5247).

  • QueryPlanExecutor@FunctionalInterface contract for analytics engine plan execution
  • AnalyticsExecutionEngine — Implements ExecutionEngine, bridges QueryPlanExecutor output to QueryResponse pipeline
  • RestUnifiedQueryAction — Orchestrates the analytics query path: routing, planning via UnifiedQueryPlanner, execution, response formatting on sql-worker thread pool
  • RestPPLQueryAction — Routing branch: parquet_ prefixed indices → analytics path; all others → existing Lucene path
  • StubQueryPlanExecutor — Canned data for parquet_logs and parquet_metrics tables for development/testing
  • AnalyticsPPLIT — Integration tests validating full pipeline end-to-end

Test plan

  • 12 unit tests for AnalyticsExecutionEngine (type mapping, row conversion, query size limit, null handling, error propagation, explain)
  • 10 unit tests for RestUnifiedQueryAction (index name extraction, routing detection)
  • 6 integration tests (AnalyticsPPLIT) — full pipeline with stub executor against real test cluster
  • Full regression test suite

…ices (opensearch-project#5247)

Implements the execution engine adapter and query routing infrastructure
for Project Mustang's unified query pipeline (Option B). PPL queries
targeting parquet_ prefixed indices are routed through UnifiedQueryPlanner
and AnalyticsExecutionEngine instead of the existing Lucene path.

Key changes:
- QueryPlanExecutor interface: contract for analytics engine execution
- AnalyticsExecutionEngine: converts QueryPlanExecutor results to QueryResponse
- RestUnifiedQueryAction: orchestrates schema building, planning, execution
- RestPPLQueryAction: routing branch for parquet_ indices
- StubQueryPlanExecutor: canned data for development/testing

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs added PPL Piped processing language enhancement New feature or request labels Mar 23, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 23, 2026

PR Reviewer Guide 🔍

(Review updated until commit ac9ca64)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Core AnalyticsExecutionEngine and QueryPlanExecutor interface

Relevant files:

  • core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java
  • core/src/main/java/org/opensearch/sql/executor/analytics/QueryPlanExecutor.java
  • core/src/test/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngineTest.java

Sub-PR theme: Plugin-level routing, stub executor, and integration tests

Relevant files:

  • plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java
  • plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java
  • plugin/src/main/java/org/opensearch/sql/plugin/rest/RestPPLQueryAction.java
  • plugin/src/main/java/org/opensearch/sql/plugin/SQLPlugin.java
  • plugin/src/test/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryActionTest.java
  • integ-test/src/test/java/org/opensearch/sql/ppl/AnalyticsPPLIT.java

⚡ Recommended focus areas for review

JSON Injection

The reportError method manually constructs a JSON string by embedding e.getClass().getSimpleName() directly without sanitization. A class name could theoretically contain characters that break JSON structure. More critically, the manual escaping of reason is incomplete — it does not escape tab characters (\t) or other control characters (Unicode control chars), which can produce invalid JSON. Use a proper JSON serializer (e.g., Jackson or Gson) instead of string concatenation.

private static void reportError(RestChannel channel, Exception e) {
  RestStatus status =
      isClientError(e) ? RestStatus.BAD_REQUEST : RestStatus.INTERNAL_SERVER_ERROR;
  String reason = e.getMessage() != null ? e.getMessage() : "Unknown error";
  // Escape characters that would break JSON
  reason =
      reason.replace("\\", "\\\\").replace("\"", "\\\"").replace("\n", "\\n").replace("\r", "");
  channel.sendResponse(
      new BytesRestResponse(
          status,
          "application/json; charset=UTF-8",
          "{\"error\":{\"type\":\""
              + e.getClass().getSimpleName()
              + "\",\"reason\":\""
              + reason
              + "\"},\"status\":"
              + status.getStatus()
              + "}"));
}
NullPointerException as Client Error

isClientError classifies NullPointerException as a client error (HTTP 400). NPEs are almost always programming bugs (server errors), not malformed user input. This will mislead users and hide server-side bugs by returning 400 instead of 500.

private static boolean isClientError(Exception e) {
  return e instanceof SyntaxCheckException
      || e instanceof SemanticCheckException
      || e instanceof IllegalArgumentException
      || e instanceof NullPointerException;
}
Stub in Production Code

StubQueryPlanExecutor is placed in plugin/src/main/java (production source set) and wired directly into RestPPLQueryAction via new StubQueryPlanExecutor(). This means stub/canned data ships in the production plugin artifact. It should be in a test source set or behind a feature flag to prevent accidental exposure in production deployments.

public class StubQueryPlanExecutor implements QueryPlanExecutor {

  @Override
  public Iterable<Object[]> execute(RelNode plan, Object context) {
    // Return canned rows matching the stub schema defined in RestUnifiedQueryAction.
    // The column order must match the schema: ts, status, message, ip_addr
    // (for parquet_logs table). For other tables, return empty results.
    String tableName = extractTableName(plan);
    if (tableName != null && tableName.contains("parquet_logs")) {
      return List.of(
          new Object[] {
            Instant.parse("2024-01-15T10:30:00Z"), 200, "Request completed", "192.168.1.1"
          },
          new Object[] {
            Instant.parse("2024-01-15T10:31:00Z"), 200, "Health check OK", "192.168.1.2"
          },
          new Object[] {
            Instant.parse("2024-01-15T10:32:00Z"), 500, "Internal server error", "192.168.1.3"
          });
    }
    if (tableName != null && tableName.contains("parquet_metrics")) {
      return List.of(
          new Object[] {Instant.parse("2024-01-15T10:30:00Z"), 75.5, 8192.5, "host-1"},
          new Object[] {Instant.parse("2024-01-15T10:31:00Z"), 82.3, 7680.5, "host-2"});
    }
    return List.of();
  }

  private String extractTableName(RelNode plan) {
    // Use RelOptUtil.toString to get the full plan tree including child nodes
    String planStr = RelOptUtil.toString(plan);
    if (planStr.contains("parquet_logs")) {
      return "parquet_logs";
    }
    if (planStr.contains("parquet_metrics")) {
      return "parquet_metrics";
    }
    return null;
  }
Routing Bypass Risk

isUnifiedQueryPath uses a simple regex on the raw query string to detect parquet_ prefix. A query like source = my_index | where message = 'parquet_logs' would not be misrouted (the regex matches the source clause), but a query like source = parquet_logs_backup would be routed to the analytics engine even if it's a regular Lucene index. The prefix check is applied to the full table name segment, so parquet_logs_backup would match. This could cause unexpected failures for any Lucene index whose name starts with parquet_.

public static boolean isUnifiedQueryPath(String query) {
  if (query == null) {
    return false;
  }
  String indexName = extractIndexName(query);
  if (indexName == null) {
    return false;
  }
  // Handle qualified names like "catalog.parquet_logs" — check the last segment
  int lastDot = indexName.lastIndexOf('.');
  String tableName = lastDot >= 0 ? indexName.substring(lastDot + 1) : indexName;
  return tableName.startsWith("parquet_");
}
Null Context Passed

In execute(RelNode, CalcitePlanContext, ResponseListener), planExecutor.execute(plan, null) always passes null as the execution context. The QueryPlanExecutor interface documents the second parameter as "execution context (opaque to avoid server dependency)". Passing null unconditionally means the real analytics engine (when substituted) will receive no context, likely causing a NullPointerException or incorrect behavior at runtime.

Iterable<Object[]> rows = planExecutor.execute(plan, null);

@github-actions
Copy link
Contributor

github-actions bot commented Mar 23, 2026

PR Code Suggestions ✨

Latest suggestions up to ac9ca64

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove NullPointerException from client error classification

NullPointerException is a programming error (server bug), not a client error.
Classifying it as a client error will return HTTP 400 for internal null pointer
bugs, masking server-side issues and making debugging harder. Remove
NullPointerException from the client error classification.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [262-267]

 private static boolean isClientError(Exception e) {
   return e instanceof SyntaxCheckException
       || e instanceof SemanticCheckException
-      || e instanceof IllegalArgumentException
-      || e instanceof NullPointerException;
+      || e instanceof IllegalArgumentException;
 }
Suggestion importance[1-10]: 7

__

Why: NullPointerException is a programming/server error, not a client error. Classifying it as a client error (HTTP 400) would mask server-side bugs and make debugging harder. This is a valid correctness concern.

Medium
Security
Fix incomplete JSON escaping in error responses

Manual JSON string construction is fragile and prone to injection if the error
message contains characters like <, >, or tab characters that are not escaped. The
current escaping also strips \r entirely rather than escaping it, which is
inconsistent. Use a proper JSON serialization approach or at minimum also escape tab
characters (\t) and other control characters.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [284-296]

-reason =
-    reason.replace("\\", "\\\\").replace("\"", "\\\"").replace("\n", "\\n").replace("\r", "");
+reason = reason
+    .replace("\\", "\\\\")
+    .replace("\"", "\\\"")
+    .replace("\n", "\\n")
+    .replace("\r", "\\r")
+    .replace("\t", "\\t");
 channel.sendResponse(
     new BytesRestResponse(
         status,
         "application/json; charset=UTF-8",
         "{\"error\":{\"type\":\""
             + e.getClass().getSimpleName()
             + "\",\"reason\":\""
             + reason
             + "\"},\"status\":"
             + status.getStatus()
             + "}"));
Suggestion importance[1-10]: 5

__

Why: The manual JSON construction has incomplete escaping (missing \t and \r is stripped rather than escaped), which could cause malformed JSON. However, the suggestion only adds \t escaping and fixes \r, which is a minor improvement; a proper JSON library would be a better fix.

Low
General
Fix substring matching order to avoid incorrect table routing

The check for "parquet_logs" will also match "parquet_logs_archive" or any other
table name that contains "parquet_logs" as a substring, returning incorrect stub
data. Use exact matching or check for "parquet_metrics" before "parquet_logs" since
"parquet_metrics" does not contain "parquet_logs", but the reverse order is still
fragile.

plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java [49-59]

 private String extractTableName(RelNode plan) {
   // Use RelOptUtil.toString to get the full plan tree including child nodes
   String planStr = RelOptUtil.toString(plan);
+  if (planStr.contains("parquet_metrics")) {
+    return "parquet_metrics";
+  }
   if (planStr.contains("parquet_logs")) {
     return "parquet_logs";
-  }
-  if (planStr.contains("parquet_metrics")) {
-    return "parquet_metrics";
   }
   return null;
 }
Suggestion importance[1-10]: 4

__

Why: The current order checks parquet_logs before parquet_metrics, which is fine since parquet_metrics doesn't contain parquet_logs as a substring. However, reordering to check parquet_metrics first is a defensive improvement for a stub class, though the practical impact is low.

Low

Previous suggestions

Suggestions up to commit ccd1665
CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove NullPointerException from client error classification

NullPointerException should not be classified as a client error — it is a
programming bug (server error), not a user-caused bad query. Including it here will
cause server-side bugs to be silently reported as 400 Bad Request, masking real
issues.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [262-267]

 private static boolean isClientError(Exception e) {
     return e instanceof SyntaxCheckException
         || e instanceof SemanticCheckException
-        || e instanceof IllegalArgumentException
-        || e instanceof NullPointerException;
+        || e instanceof IllegalArgumentException;
   }
Suggestion importance[1-10]: 7

__

Why: NullPointerException is a programming bug (server error), not a user-caused bad query. Classifying it as a client error will mask real server-side bugs by returning 400 instead of 500, making debugging harder.

Medium
Guard thread pool scheduling against uncaught exceptions

If SQL_WORKER_THREAD_POOL_NAME does not exist at the time of scheduling (e.g.,
during startup or in tests), schedule may throw an unchecked exception that is not
caught, leaving the REST channel without a response and potentially hanging the
client. Wrap the scheduling call in a try-catch to ensure the channel always
receives a response.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [116-122]

-client
-    .threadPool()
-    .schedule(
-        () -> doExecute(query, queryType, channel, isExplain),
-        new TimeValue(0),
-        SQL_WORKER_THREAD_POOL_NAME);
+try {
+  client
+      .threadPool()
+      .schedule(
+          () -> doExecute(query, queryType, channel, isExplain),
+          new TimeValue(0),
+          SQL_WORKER_THREAD_POOL_NAME);
+} catch (Exception e) {
+  recordFailureMetric(e);
+  reportError(channel, e);
+}
Suggestion importance[1-10]: 5

__

Why: If schedule throws (e.g., unknown thread pool name), the channel would hang without a response. Wrapping in try-catch is a reasonable defensive measure, though this scenario is unlikely in normal operation.

Low
Security
Fix incomplete JSON escaping in error response

Manual JSON string construction via concatenation is fragile — it does not handle
all special characters (e.g., tab \t, null bytes, Unicode control characters), which
can produce invalid JSON and potentially expose injection vectors. Use a proper JSON
serialization approach or at minimum add escaping for tab and other control
characters.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [284-296]

 reason =
-    reason.replace("\\", "\\\\").replace("\"", "\\\"").replace("\n", "\\n").replace("\r", "");
+    reason.replace("\\", "\\\\")
+        .replace("\"", "\\\"")
+        .replace("\n", "\\n")
+        .replace("\r", "\\r")
+        .replace("\t", "\\t");
 channel.sendResponse(
     new BytesRestResponse(
         status,
         "application/json; charset=UTF-8",
         "{\"error\":{\"type\":\""
             + e.getClass().getSimpleName()
             + "\",\"reason\":\""
             + reason
             + "\"},\"status\":"
             + status.getStatus()
             + "}"));
Suggestion importance[1-10]: 5

__

Why: The manual JSON construction misses tab and other control characters, which could produce invalid JSON. Adding \t escaping and ideally using a proper JSON library would be more robust, though the security risk is limited since the content comes from exception messages.

Low
General
Fix substring matching order to avoid wrong stub data

"parquet_logs" is a substring of "parquet_logs_archive" or similar names, and
"parquet_metrics" check order matters — but more critically, "parquet_logs" will
also match any plan string that contains "parquet_logs" as part of a longer table
name (e.g., parquet_logs_v2), returning wrong stub data. Use word-boundary or exact
matching to avoid false positives.

plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java [49-59]

 private String extractTableName(RelNode plan) {
-    // Use RelOptUtil.toString to get the full plan tree including child nodes
     String planStr = RelOptUtil.toString(plan);
+    if (planStr.contains("parquet_metrics")) {
+      return "parquet_metrics";
+    }
     if (planStr.contains("parquet_logs")) {
       return "parquet_logs";
-    }
-    if (planStr.contains("parquet_metrics")) {
-      return "parquet_metrics";
     }
     return null;
   }
Suggestion importance[1-10]: 4

__

Why: Checking "parquet_metrics" before "parquet_logs" avoids false positives since "parquet_logs" is not a substring of "parquet_metrics", but the real concern about "parquet_logs" matching "parquet_logs_v2" is valid. However, this is a stub for development/testing only, so the impact is limited.

Low
Suggestions up to commit 10d08f2
CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove NullPointerException from client error classification

NullPointerException is a programming error (server bug), not a client error.
Including it in isClientError will cause server-side null pointer bugs to be
silently reported as 400 Bad Request, masking real issues and making debugging
harder. Remove it from the client error classification.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [262-267]

 private static boolean isClientError(Exception e) {
     return e instanceof SyntaxCheckException
         || e instanceof SemanticCheckException
-        || e instanceof IllegalArgumentException
-        || e instanceof NullPointerException;
+        || e instanceof IllegalArgumentException;
   }
Suggestion importance[1-10]: 7

__

Why: NullPointerException is a programming/server error, not a client error. Classifying it as a client error would return 400 Bad Request for server-side bugs, masking real issues. This is a valid correctness concern.

Medium
Fix table name matching order to avoid substring false positives

The check for "parquet_logs" will also match "parquet_logs_archive" or any other
table whose name contains "parquet_logs" as a substring, returning wrong stub data.
Additionally, "parquet_logs" is checked before "parquet_metrics", but since
"parquet_metrics" does not contain "parquet_logs", the ordering is coincidentally
safe — however, the substring matching is still fragile. Use word-boundary or
exact-match checks to avoid false positives.

plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java [49-59]

 private String extractTableName(RelNode plan) {
-    // Use RelOptUtil.toString to get the full plan tree including child nodes
     String planStr = RelOptUtil.toString(plan);
+    if (planStr.contains("parquet_metrics")) {
+      return "parquet_metrics";
+    }
     if (planStr.contains("parquet_logs")) {
       return "parquet_logs";
-    }
-    if (planStr.contains("parquet_metrics")) {
-      return "parquet_metrics";
     }
     return null;
   }
Suggestion importance[1-10]: 3

__

Why: The suggestion reorders checks to put parquet_metrics before parquet_logs, but since parquet_metrics doesn't contain parquet_logs as a substring, the original ordering is already safe. The improvement is minimal and the stub is explicitly temporary code.

Low
General
Fix incomplete JSON escaping in error responses

Manual JSON string construction via concatenation is fragile and prone to injection
if the error message contains characters like <, >, or tab characters that are not
escaped. Use a proper JSON serialization approach or at minimum add escaping for all
control characters and special JSON characters to prevent malformed JSON responses.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [284-296]

-reason =
-        reason.replace("\\", "\\\\").replace("\"", "\\\"").replace("\n", "\\n").replace("\r", "");
-    channel.sendResponse(
-        new BytesRestResponse(
-            status,
-            "application/json; charset=UTF-8",
-            "{\"error\":{\"type\":\""
-                + e.getClass().getSimpleName()
-                + "\",\"reason\":\""
-                + reason
-                + "\"},\"status\":"
-                + status.getStatus()
-                + "}"));
+reason = reason.replace("\\", "\\\\")
+               .replace("\"", "\\\"")
+               .replace("\n", "\\n")
+               .replace("\r", "\\r")
+               .replace("\t", "\\t");
+channel.sendResponse(
+    new BytesRestResponse(
+        status,
+        "application/json; charset=UTF-8",
+        "{\"error\":{\"type\":\""
+            + e.getClass().getSimpleName()
+            + "\",\"reason\":\""
+            + reason
+            + "\"},\"status\":"
+            + status.getStatus()
+            + "}"));
Suggestion importance[1-10]: 4

__

Why: The suggestion adds \t escaping to the manual JSON construction, which is a minor improvement. However, the core issue of using manual JSON concatenation remains, and the improvement is marginal since tab characters in error messages are uncommon.

Low
Document intentional null context passed to plan executor

Passing null as the execution context to planExecutor.execute is hardcoded and
ignores any context information available in CalcitePlanContext. When the real
analytics engine replaces the stub, this will silently pass no context, potentially
causing incorrect behavior or NullPointerExceptions inside the executor. Consider
passing a meaningful context object derived from context, or at minimum document why
null is intentional here.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [72]

+// TODO: Pass a real execution context derived from CalcitePlanContext when analytics engine is ready
 Iterable<Object[]> rows = planExecutor.execute(plan, null);
Suggestion importance[1-10]: 1

__

Why: This suggestion only adds a comment/TODO to existing code without changing behavior. The improved_code is functionally identical to the existing_code, so it scores very low per the guidelines.

Low
Suggestions up to commit 3b1ecb8
CategorySuggestion                                                                                                                                    Impact
Security
Fix unsafe manual JSON string construction

The error message is manually escaped only for double quotes, but other special
characters (backslashes, newlines, control characters) in the exception message can
break the JSON structure or cause injection issues. Use a proper JSON serialization
library (e.g., Jackson's ObjectMapper or org.json.JSONObject) to safely serialize
the error message.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [263-273]

 private static void reportError(RestChannel channel, Exception e) {
+  String safeMessage = e.getMessage() != null ? e.getMessage() : "Unknown error";
+  org.json.JSONObject errorJson = new org.json.JSONObject();
+  org.json.JSONObject errorDetail = new org.json.JSONObject();
+  errorDetail.put("type", e.getClass().getSimpleName());
+  errorDetail.put("reason", safeMessage);
+  errorJson.put("error", errorDetail);
+  errorJson.put("status", 500);
   channel.sendResponse(
       new BytesRestResponse(
           org.opensearch.core.rest.RestStatus.INTERNAL_SERVER_ERROR,
           "application/json; charset=UTF-8",
-          "{\"error\":{\"type\":\""
-              + e.getClass().getSimpleName()
-              + "\",\"reason\":\""
-              + (e.getMessage() != null ? e.getMessage().replace("\"", "\\\"") : "Unknown error")
-              + "\"},\"status\":500}"));
+          errorJson.toString()));
 }
Suggestion importance[1-10]: 7

__

Why: The manual JSON construction only escapes double quotes, leaving the response vulnerable to malformed JSON if the exception message contains backslashes, newlines, or other special characters. Using a proper JSON library would be safer and more robust.

Medium
Possible issue
Replace fragile string-matching with proper RelNode tree traversal

Using RelOptUtil.toString(plan) to extract the table name by string-matching the
plan's text representation is fragile and error-prone — it can produce false
positives if a table name appears in a filter literal or column alias. The table
name should be extracted by traversing the RelNode tree and inspecting TableScan
nodes directly via RelNode.getTable().

plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java [49-59]

 private String extractTableName(RelNode plan) {
-  // Use RelOptUtil.toString to get the full plan tree including child nodes
-  String planStr = RelOptUtil.toString(plan);
-  if (planStr.contains("parquet_logs")) {
-    return "parquet_logs";
+  // Walk the RelNode tree to find a TableScan and extract its table name
+  if (plan instanceof org.apache.calcite.rel.core.TableScan) {
+    java.util.List<String> qualifiedName = plan.getTable().getQualifiedName();
+    if (!qualifiedName.isEmpty()) {
+      String tableName = qualifiedName.get(qualifiedName.size() - 1);
+      if (tableName.contains("parquet_logs")) return "parquet_logs";
+      if (tableName.contains("parquet_metrics")) return "parquet_metrics";
+    }
   }
-  if (planStr.contains("parquet_metrics")) {
-    return "parquet_metrics";
+  for (RelNode input : plan.getInputs()) {
+    String found = extractTableName(input);
+    if (found != null) return found;
   }
   return null;
 }
Suggestion importance[1-10]: 5

__

Why: Using RelOptUtil.toString(plan) for table name extraction is fragile and can produce false positives if table names appear in literals or aliases. The suggested tree traversal approach is more correct, though this is a stub class intended to be replaced.

Low
Pass execution context instead of null to plan executor

Passing null as the execution context to planExecutor.execute is undocumented and
may cause a NullPointerException in the executor implementation. The
CalcitePlanContext (context) is already available in scope and should be passed (or
a derived context object) to allow the executor to access query settings,
credentials, or other runtime information.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [72]

-Iterable<Object[]> rows = planExecutor.execute(plan, null);
+Iterable<Object[]> rows = planExecutor.execute(plan, context);
Suggestion importance[1-10]: 4

__

Why: Passing null as the context is intentional per the QueryPlanExecutor interface design (context is "opaque to avoid server dependency"), but passing the available context object could provide useful runtime information. However, the interface accepts Object and the current design explicitly uses null, so this change may not be appropriate without further interface design decisions.

Low
General
Extract hardcoded routing prefix into a named constant

The routing check tableName.startsWith("parquet_") is a fragile, hardcoded
convention that will silently misroute queries if the naming convention changes or
if a legitimate non-Parquet index happens to start with "parquet_". This logic
should be driven by index metadata (e.g., index settings) rather than a name prefix,
and at minimum should be extracted into a named constant or configurable predicate
to make the coupling explicit.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [76-88]

+/** Prefix used to identify Parquet-backed indices. Replace with metadata lookup when available. */
+static final String PARQUET_INDEX_PREFIX = "parquet_";
+
 public static boolean isUnifiedQueryPath(String query) {
   if (query == null) {
     return false;
   }
   String indexName = extractIndexName(query);
   if (indexName == null) {
     return false;
   }
   // Handle qualified names like "catalog.parquet_logs" — check the last segment
   int lastDot = indexName.lastIndexOf('.');
   String tableName = lastDot >= 0 ? indexName.substring(lastDot + 1) : indexName;
-  return tableName.startsWith("parquet_");
+  return tableName.startsWith(PARQUET_INDEX_PREFIX);
 }
Suggestion importance[1-10]: 3

__

Why: Extracting "parquet_" into a named constant improves maintainability, but the suggestion itself acknowledges this is a temporary stub mechanism. The improvement is minor given the temporary nature of the code.

Low

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 10d08f2.

PathLineSeverityDescription
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java271mediumNullPointerException is classified as a client error (4xx) alongside SyntaxCheckException. NPEs are programming bugs, not client errors. This classification could suppress legitimate server-side errors, masking failures and potentially allowing information disclosure about internal state to be treated as benign client faults.
plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java30lowStubQueryPlanExecutor completely ignores the actual Calcite plan and returns hardcoded rows based solely on table name string matching. If this stub remains in production, any query to parquet_* tables bypasses actual query evaluation (filters, access controls embedded in the plan are ignored). The code is clearly marked as a stub, but its presence in the plugin's main source path (not test) warrants attention.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestPPLQueryAction.java95lowQueries matching parquet_* index prefix are routed to a new execution pipeline that bypasses the standard PPLQueryAction transport action path. If the standard path performs security checks (e.g., authorization, audit logging) that the new path does not replicate, this routing could inadvertently skip those controls. This appears intentional by design but warrants review of the security controls in both paths.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 1 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

Persistent review updated to latest commit 10d08f2

- Classify client vs server errors in RestUnifiedQueryAction (syntax
  errors return 400, engine failures return 500)
- Fix JSON escaping in error responses for special characters
- Add integration tests: explain with filter/aggregation, syntax error
  handling, response format validation, regression checks
- Increment correct metrics (PPL_FAILED_REQ_COUNT_CUS vs SYS)

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the feature/mustang-analytics-execution-engine branch from 10d08f2 to ccd1665 Compare March 23, 2026 23:53
@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit ccd1665.

PathLineSeverityDescription
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestPPLQueryAction.java92highQueries targeting 'parquet_' prefixed indices are silently routed to RestUnifiedQueryAction, bypassing the normal PPLQueryAction pipeline. If the standard pipeline enforces authorization, row-level security, or audit logging, this routing logic creates a bypass for any index prefixed with 'parquet_'. The check is purely string-based on the query text, which an attacker controlling index naming could exploit.
plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java1mediumA stub/development executor is wired into the production code path in SQLPlugin.java and RestPPLQueryAction. This stub uses RelOptUtil.toString(plan) to make routing decisions and returns hardcoded data. Deploying stub code in production is anomalous and could mask the real executor with predictable, controllable behavior.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java262mediumNullPointerException is classified as a client error, causing it to increment PPL_FAILED_REQ_COUNT_CUS instead of the server-side failure metric. This suppresses server-side error visibility and could allow server bugs to appear as client-caused failures, hampering detection of exploitation attempts or internal failures.
core/src/test/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngineTest.java61lowTest code uses reflection with field.setAccessible(true) to set a final field on a mocked object. While a common testing pattern, this bypasses Java access controls and could be misused if test utilities are accidentally included in production artifacts.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 1 | Medium: 2 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit ccd1665.

PathLineSeverityDescription
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestPPLQueryAction.java92mediumQueries targeting 'parquet_*' indices are silently routed away from the standard nodeClient.execute(PPLQueryAction.INSTANCE, ...) transport pipeline. Any authorization, auditing, or security controls implemented in that transport action are bypassed for this new code path. Legitimate for a new engine integration, but warrants review of what checks are skipped.
plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java1mediumStubQueryPlanExecutor is in the production source tree (plugin/src/main/java), not in test code. It returns hardcoded data for any 'parquet_*' query regardless of query predicates, filters, or access scope. Deploying this to production would serve fabricated data to users querying those indices and suppress real query errors. File is labeled as temporary, but it is wired into the live request path via RestPPLQueryAction.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java272lowNullPointerException is classified as a client error in isClientError(), causing it to be logged at WARN level, counted as a user-caused failure metric, and returned as HTTP 400. NPEs are almost always server-side bugs; misclassifying them obscures real server errors and could make server-side failures harder to detect and alert on.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 2 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

Persistent review updated to latest commit ccd1665

qianheng-aws and others added 2 commits March 24, 2026 11:47
Signed-off-by: Heng Qian <qianheng@amazon.com>
- Parse explain JSON and verify calcite.logical, calcite.physical (null),
  calcite.extended (null) structure
- Verify exact plan operators (LogicalProject, LogicalFilter, LogicalSort,
  LogicalAggregate) and their contents (column names, filter conditions)
- Add sort plan explain test

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit ac9ca64.

PathLineSeverityDescription
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java262lowNullPointerException is classified as a client error, causing NPEs to be logged at WARN level and counted against customer error metrics rather than system error metrics. This masks server-side bugs from operators and monitoring systems, but does not appear intentionally malicious.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestPPLQueryAction.java52lowThe ClusterService parameter is accepted in the constructor but never used — only NodeClient is consumed. Passing an unused privileged service object is an anomaly worth noting, though no exploit of it is present in this diff.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

Persistent review updated to latest commit ac9ca64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants