I have this issue, reproduced simply bellow:
Create 2 indices test-a
and test-b
. Each has a unique record, not existing on the other.
ESHOST=http://localhost:9200
CURLOPT="-s -u admin:admin"
# create test-a with 3 records: 1, 2, 3 -- not including record 4
curl --location --request POST '${ESHOST}/test-a/_bulk?pretty' \
$CURLOPT \
--header 'Content-Type: application/json' \
--data-raw '{"index":{"_id":"1"}}
{"iCCID":"1"}
{"index":{"_id":"2"}}
{"iCCID":"2"}
{"index":{"_id":"3"}}
{"iCCID":"3"}'
# create test-b with 3 records: 1, 2, 4 -- not including record 3
curl --location --request POST '${ESHOST}/test-b/_bulk?pretty' \
$CURLOPT \
--header 'Content-Type: application/json' \
--data-raw '{"index":{"_id":"1"}}
{"iCCID":"1"}
{"index":{"_id":"2"}}
{"iCCID":"2"}
{"index":{"_id":"3"}}
{"iCCID":"4"}'
LEFT JOIN on those indices seemed promising at first:
ESHOST=http://localhost:9200
CURLOPT="-s -u admin:admin"
curl --location --request POST '${ESHOST}/_opendistro/_sql' \
$CURLOPT \
--header 'Content-Type: application/json' \
--data-raw '{"query" : "SELECT a.iCCID iCCID, b.iCCID iCCID_b FROM test-a a LEFT JOIN test-b b ON a.iCCID = b.iCCID"}'
Response from LEFT JOIN looks correct:
{
"schema": [
{
"name": "a.iCCID",
"alias": "iCCID",
"type": "text"
},
{
"name": "b.iCCID",
"alias": "iCCID_b",
"type": "text"
}
],
"total": 3,
"datarows": [
[
"1",
"1"
],
[
"2",
"2"
],
[
"3",
null
]
],
"size": 3,
"status": 200
}
So I go on trying a LEFT JOIN, this time adding a WHERE condition:
ESHOST=http://localhost:9200
CURLOPT="-s -u admin:admin"
curl --location --request POST '${ESHOST}/_opendistro/_sql' \
$CURLOPT \
--header 'Content-Type: application/json' \
--data-raw '{"query" : "SELECT a.iCCID iCCID, b.iCCID iCCID_b FROM test-a a LEFT JOIN test-b b ON a.iCCID = b.iCCID where b.iCCID is null"}'
Response from LEFT JOIN with WHERE condition:
{
"schema": [
{
"name": "a.iCCID",
"alias": "iCCID",
"type": "text"
},
{
"name": "b.iCCID",
"alias": "iCCID_b",
"type": "text"
}
],
"total": 3,
"datarows": [
[
"2",
null
],
[
"3",
null
],
[
"1",
null
]
],
"size": 3,
"status": 200
}
Here, i was expecting the query to return just one record (record 3), but received an unexpected result.
I tried other approaches using SQL, such as subquery, nothing worked for me. So I am posting here in hope of advise, or just in case somebody has a better way of approaching this.