-
Notifications
You must be signed in to change notification settings - Fork 188
Description
This is a meta issue collecting a few different ideas from different sources.
The core problem is that misbehaving queries are typically hard to debug & require a lot of knowledge of PPL/SQL engine behavior, and there's room for enhancement.
Discussing the concept with @anasalkouz, there are three main problem classes we want to enhance:
- Something went wrong (execution error)
- 0 results are returned, why?
- The query is slow, why?
So, compiling existing issues for the backend:
- [RFC] Error Report-building Mechanism for PPL Errors #4919 is the backend for the first one, it introduces a general reporting interface that can integrate with our current exceptions and add context. We can incrementally drill into error classes over time.
- [FEATURE] Support
analyzealongsideexplain#4343 can give the foundation for the latter 2, by providing analyze metrics for results returned & timings at all stages.
Once those have reasonably complete implementations, errors will be in structured reports like this:
{
"status": 400,
"error": {
"type": "SemanticCheckException",
"code": "FIELD_NOT_FOUND",
"reason": "Invalid Query",
"details": "Failed to resolve field 'foo'",
"location": [
"while planning the query",
"while resolving fields in the index mapping"
],
"context": {
"index_pattern": "logs-*",
"position": {"line": 1, "column": 25},
"query": "source=logs-* | fields foo",
"query_id": "b6627794-3939-4ac4-8c5b-821ccc400f4f"
},
"suggestion": "Did you mean: 'foobar'"
}
}Frontends can choose to do whatever they want with this: render specific fixed pieces of context (e.g. position is pointing to a spot in the query, highlight it?), render details/locations/suggestions, throw it in an LLM, etc. It also helps with oncall debugging when given these responses (either directly or from har files). #4919 (comment) shows me doing this quickly for the SQL CLI based on a proof-of-concept implementation.
From the frontend, there's a separate meta issue:
- [META] Integrate with SQL/PPL Error Reports OpenSearch-Dashboards#11577 is the catch-all for integrating with the reporting interface in the frontend.
- Explain-analyze --> debug will be a call to action when a query runs either over some threshold (1000ms?) or with 0 results.
Once the core flows here are done, what's left is to start vetting specific error cases:
- [BUG] Circuit breaker getting triggered when multiple PPL queries are fired in parallel. #4771
- [FEATURE] Unclear PPL error message: Empty mapping on wildcard indices #4872
- [FEATURE] Improve resource manager error messaging #4869
- [BUG]
ArrayIndexOutOfBoundsExceptionwhen querying index with disabled objects containing dot-only field names #4896 - [BUG] Calcite PPL doesn't handle array value columns if codegen triggered #5065
- Field not found (issue pending)
- Suggest syntax rewrites ([FEATURE]
IS NOT NULLcondition support #5262 is a special case but it'd be nice to do it in the general case) - Zero-result & slow query explanations on the frontend
- Optionally: anything else with the error-experience label. Feel free to suggest some!
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status