[FEATURE] SQL/PPL language support for Analytics engine integration

**Is your feature request related to a problem?**

The SQL/PPL plugin currently only supports querying Lucene-backed OpenSearch indices. A new Analytics engine introduces Parquet as an alternate storage format with an embedded query engine (DataFusion) for log analytics workloads. Users expect to query Parquet-backed indices using the same `_plugins/_sql` and `_plugins/_ppl` REST endpoints with the same SQL/PPL language support, but the current plugin has no integration path to route queries to the Analytics engine for execution.

### Technical Requirements

- **UX consistency**: Same REST endpoints, response format, error messages, and explain output for both Lucene and non-Lucene queries
- **Engineering complexity**: Minimal changes to existing SQL/PPL plugin code; reuse existing parsing, formatting, and metrics infrastructure
- **Plugin coupling**: Clean API boundary between SQL/PPL plugin and Analytics engine; each can evolve independently
- **Performance**: No unnecessary performance overhead or data copying in the query handoff path
- **Resource management**: Query size limits, circuit breakers, and timeouts enforced on the unified path
- **Monitoring/Observability**: Request/failure metrics, query profiling, and logging for the unified path consistent with existing paths
- **Security**: Index-level authorization during routing; security context propagated to execution thread
- **Testability**: End-to-end integration tests with the real Analytics engine plugin loaded in the test cluster
- **Long-term alignment**: Architecture supports future ANSI SQL, cross-engine JOINs, and additional storage engines

**What solution would you like?**

### High-level Design

The SQL/PPL plugin integrates with the Analytics engine through a new Unified Query Pipeline: the plugin handles query routing, parsing, and generating engine-agnostic Calcite `RelNode` logical plans, then hands them off to the Analytics engine for physical optimization, distributed planning, and execution.

<img width="1584" height="691" alt="Image" src="https://github.com/user-attachments/assets/3f7922d0-9dcd-403c-b3c2-6ce6b2c43989" />

Here is the high-level workflow:

1. `RestSqlAction`/`RestPPLQueryAction` routes queries targeting non-Lucene indices to a new `RestUnifiedQueryAction`; Lucene queries fall through to the existing V2/V3 path unchanged.
2. `RestUnifiedQueryAction` builds a Calcite schema from cluster state via the Analytics engine's `OpenSearchSchemaBuilder` (standard SQL types, no UDTs).
3. `UnifiedQueryPlanner.plan()` parses the query (Calcite native SQL parser or PPL V3 Calcite path) and generates a logical `RelNode`.
4. `AnalyticsExecutionEngine` passes the `RelNode` directly to the Analytics engine's `QueryPlanExecutor.execute()` — no serialization needed (shared classloader via extendedPlugins).
5. Results (`Iterable<Object[]>`) are converted to `ExprValue` and formatted as JDBC JSON via the existing `JdbcResponseFormatter`.

### Component Responsibilities

| | SQL/PPL Plugin | Analytics Engine |
|---|---|---|
| Query routing | Detect non-Lucene indices by routing rules from Analytics engine and route to unified pipeline | Provide index settings-based routing rules |
| Parsing | Parse SQL (Calcite native parser) and PPL (ANTLR → CalciteRelNodeVisitor) into RelNode | — |
| Logical optimization | Apply language-specific HepPlanner rewrites (PPL dedup/stats simplification) | — |
| Physical optimization | — | Pushdown optimization by rules registered in `TableScan.register()` |
| Distributed query planning | — | Fragment the plan across data nodes, manage distributed execution and result aggregation |
| Schema | Pass schema to `UnifiedQueryContext` for planning | Provide `OpenSearchSchemaBuilder` to build Calcite `SchemaPlus` from cluster state |
| Execution | Hand off `RelNode` to `QueryPlanExecutor` | Execute plan via back-end engines, return `Iterable<Object[]>` |
| Exception handling | Catch and return error responses | Fail-fast and throw exceptions for unsupported operations |
| Response formatting | Convert results to default format via existing formatter | — |
| Observability | Metrics, logging, profiling for unified path | — |

### API Contract

- **SQL/PPL → Analytics engine**: `RelNode` logical plan with standard Calcite types and PPL/SQL UDF nodes (e.g., match, match_phrase)
- **Analytics engine → SQL/PPL**: `SchemaBuilder` for schema provisioning, `QueryPlanExecutor.execute(RelNode, context)` returning Iterable<Object[]>, routing rules for index detection

**What alternatives have you considered?**

<img width="1467" height="906" alt="Image" src="https://github.com/user-attachments/assets/6ee8c326-99ab-4da2-99b3-3356cb0345d4" />

- **Option A (Query Delegation)**: Forward raw SQL/PPL query text to the Analytics engine via transport action. Simplest coupling, but bypasses the SQL/PPL plugin entirely and does not cleanly separate the query language frontend from the execution backend.
- **Option C (Calcite Schema Adapter)**: Register a Parquet Calcite adapter in the existing PPL V3 pipeline. Tightest coupling with the current codebase — inherits the most infrastructure, but carries compatibility risk as the Analytics engine. The SQL execution path would still need a separate integration, since the current pipeline is PPL-only.

**Option B (Unified Query Pipeline)** was selected for its clean separation of query language frontend (SQL/PPL plugin) and execution backend (Analytics engine), reuse of existing plugin infrastructure, and a single integration path that serves both SQL and PPL.

**Do you have any additional context?**

- Analytics engine: https://github.com/opensearch-project/OpenSearch/tree/main/sandbox/plugins/analytics-engine
- PoC branch: https://github.com/dai-chen/sql-1/pull/10
- Pending alignment:
    - API contract finalization
    - Plugin integration mechanism (extendedPlugins vs transport action)
    - SQL search function support scope
    - Integration-ready Analytics engine build

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] SQL/PPL language support for Analytics engine integration #5246

Technical Requirements

High-level Design

Component Responsibilities

API Contract

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	SQL/PPL Plugin	Analytics Engine
Query routing	Detect non-Lucene indices by routing rules from Analytics engine and route to unified pipeline	Provide index settings-based routing rules
Parsing	Parse SQL (Calcite native parser) and PPL (ANTLR → CalciteRelNodeVisitor) into RelNode	—
Logical optimization	Apply language-specific HepPlanner rewrites (PPL dedup/stats simplification)	—
Physical optimization	—	Pushdown optimization by rules registered in `TableScan.register()`
Distributed query planning	—	Fragment the plan across data nodes, manage distributed execution and result aggregation
Schema	Pass schema to `UnifiedQueryContext` for planning	Provide `OpenSearchSchemaBuilder` to build Calcite `SchemaPlus` from cluster state
Execution	Hand off `RelNode` to `QueryPlanExecutor`	Execute plan via back-end engines, return `Iterable<Object[]>`
Exception handling	Catch and return error responses	Fail-fast and throw exceptions for unsupported operations
Response formatting	Convert results to default format via existing formatter	—
Observability	Metrics, logging, profiling for unified path	—

[FEATURE] SQL/PPL language support for Analytics engine integration #5246

Description

Technical Requirements

High-level Design

Component Responsibilities

API Contract

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions