Issue with SQL Plugin in AWS OpenSearch When Using WHERE Clause with Aliases and Joins

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): 2.19.0

Describe the issue:I’m using the SQL plugin in AWS OpenSearch and encountering an issue when applying a WHERE clause in a query that involves:

  • Querying using alias names

  • Joining two indices using alias names

  • Applying a WHERE condition after the join

Here’s what I’ve tried:

  1. Querying with alias names only – works fine.

  2. Joining two indices using alias names – also works fine.

  3. Adding a WHERE condition to the joined query – throws an error.

I’ve attached the query and the error message below. Could someone help me understand why the WHERE clause causes an issue in this context and how to resolve it?

Configuration:

Relevant Logs or Screenshots:

POST _plugins/_sql?format=json
{
“query” : “SELECT obj.*, process.log.attributes.Image, process.log.attributes.ProcessId FROM detection_data as obj INNER JOIN process_creation as process ON obj.spanId = process.log.attributes.EventUuId WHERE obj.spanId = ‘dfb1db72-7ec7-4e89-b753-57638759b4f6’”
}

“{\n “error”: {\n “reason”: “Error occurred in OpenSearch engine: no such index [The index ‘detection_data’ could not be found. Note that wildcard indices are not permitted in SQL.]”,\n “details”: “[The index ‘detection_data’ could not be found. Note that wildcard indices are not permitted in SQL.] IndexNotFoundException[no such index [The index ‘detection_data’ could not be found. Note that wildcard indices are not permitted in SQL.]]\nFor more details, please send request for Json format to see the raw response from OpenSearch engine.”,\n “type”: “IndexNotFoundException”\n },\n “status”: 404\n}”

@Aravinth This would appear to be a bug, I suggest raising an issue here

Once of the workarounds could potentially be to use filtered aliases:

POST /_aliases 
{"actions":[{"add":{"index":"detection_data-000001","alias":"detection_data_filtered","filter":{"term":{"spanId":"dfb1db72-7ec7-4e89-b753-57638759b4f6"}}}}]}

Then join aliases:

POST _plugins/_sql?format=json 
{"query":"SELECT dd.spanId, pc.log.attributes.Image, pc.log.attributes.ProcessId FROM detection_data_filtered AS dd INNER JOIN process_creation AS pc ON dd.spanId = pc.log.attributes.EventUuId"}

@Anthony Thanks for the response. Raised an issue [Feature Request] Support for Index Aliases in SQL Queries with JOIN and WHERE Clauses · Issue #19149 · opensearch-project/OpenSearch · GitHub

But I don’t think I can use filtered alias in my case because my filter is dynamic. I would like to know is there any other alternate solution for this ?

1 Like

@Aravinth If you using 2.19 you could use 2 step DLS query, then combine the results on the client side, see sample script below:

BASE="https://localhost:9200"
AUTH="admin:admin"
SPANID="dfb1db72-7ec7-4e89-b753-57638759b4f6"
DET_INDEX="detection_data"        # or detection_data-000001
PROC_INDEX="process_creation"     # or process_creation-000001

OBJ_JSON=$(curl -sku "$AUTH" -H 'Content-Type: application/json' "$BASE/$DET_INDEX/_search" -d "{
  \"size\": 1000,
  \"query\": { \"match_phrase\": { \"spanId\": \"$SPANID\" } }
}")

PROC_JSON=$(curl -sku "$AUTH" -H 'Content-Type: application/json' "$BASE/$PROC_INDEX/_search" -d "{
  \"size\": 1000,
  \"_source\": [\"log.attributes.Image\",\"log.attributes.ProcessId\",\"log.attributes.EventUuId\"],
  \"query\": { \"term\": { \"log.attributes.EventUuId\": \"$SPANID\" } }
}")

# Join on spanId == log.attributes.EventUuId
printf '%s\n%s\n' "$OBJ_JSON" "$PROC_JSON" \
| jq -s '
  .[0].hits.hits as $objs |
  .[1].hits.hits as $prows |
  { rows: [
      $prows[] as $p
      | $objs[]
      | select(._source.spanId == $p._source.log.attributes.EventUuId)
      | (._source + {
           process_Image:     ($p._source.log.attributes.Image // null),
           process_ProcessId: ($p._source.log.attributes.ProcessId // null)
        })
    ] }'

In >= OS3.0 you could also use try using PPL to achieve something similar.

Sample PPL:

POST _plugins/_ppl 
{ "query": "search source=detection_data as d | where spanId = 'dfb1db72-7ec7-4e89-b753-57638759b4f6' | inner join on d.spanId = b.log.attributes.EventUuId [ search source=process_creation as b | fields log.attributes.EventUuId,log.attributes.Image,log.attributes.ProcessId ] | fields d.spanId, b.log.attributes.Image, b.log.attributes.ProcessId" }

You would need to enable plugins.calcite.enabled": true, using below:

PUT _plugins/_query/settings
{
  "transient": { "plugins.calcite.enabled": true }
}

Hope this helps

1 Like