Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): OpenSearch 3.1.0
Describe the issue:
Hello OpenSearch Community,
We’ve diagnosed what appears to be a version-specific, data-volume-dependent bug in OpenSearch 3.1.0 and would appreciate any insights.
The Scenario:
-
A moderately complex PPL query (using
parse,where,rename, etc.) runs successfully in our test environment with a small dataset. -
The exact same query fails consistently in our production environment with a large dataset.
A standard PPL query that runs perfectly in our test environment (with low data volume) fails consistently in our production environment (with high data volume).
Error Message
When the query runs against the large production dataset, it fails with: "Error occurred while creating PIT for new engine SQL query"
Key Troubleshooting Findings:
-
Data Volume is the Trigger: The only difference between success and failure is the volume of data being queried. This strongly suggests the bug is hit when PPL tries to create a Point-in-Time (PIT) snapshot for the large query.
-
Standard Fix is Not Available: We attempted to disable the new SQL engine via the standard dynamic API setting (
plugins.sql.engine.new.enabled). The cluster rejected this with a400error:"setting... not recognized".
Conclusion:
We believe this is a bug in OpenSearch 3.1.0 where PPL incorrectly calls the new SQL engine when a PIT is required, and the API to disable this behavior is not available in this specific version.
Questions:
-
We have been unable to find any official documentation for the error “Error occurred while creating PIT for new engine SQL query”. Can anyone explain what this error signifies? Does it relate to user permissions for creating a PIT, or does it point to a deeper backend issue?
-
Given that we cannot use the dynamic API setting in 3.1.0, has anyone found a workaround for this specific issue, other than a full version upgrade or making a static change to
opensearch.yml? -
Why does it say SQLException for ppl query, does it underlying call sql engine? is there a missing config for it to happen?
Our conclusion is that this is a bug where PPL incorrectly calls the new SQL engine, but we would appreciate any confirmation or alternative explanations.