SQL LEFT JOIN with WHERE issue

I have this issue, reproduced simply bellow:

Create 2 indices test-a and test-b. Each has a unique record, not existing on the other.

ESHOST=http://localhost:9200
CURLOPT="-s -u admin:admin"

# create test-a with 3 records: 1, 2, 3 -- not including record 4
curl --location --request POST '${ESHOST}/test-a/_bulk?pretty' \
$CURLOPT \
--header 'Content-Type: application/json' \
--data-raw '{"index":{"_id":"1"}}
{"iCCID":"1"}
{"index":{"_id":"2"}}
{"iCCID":"2"}
{"index":{"_id":"3"}}
{"iCCID":"3"}'

# create test-b with 3 records: 1, 2, 4 -- not including record 3
curl --location --request POST '${ESHOST}/test-b/_bulk?pretty' \
$CURLOPT \
--header 'Content-Type: application/json' \
--data-raw '{"index":{"_id":"1"}}
{"iCCID":"1"}
{"index":{"_id":"2"}}
{"iCCID":"2"}
{"index":{"_id":"3"}}
{"iCCID":"4"}'

LEFT JOIN on those indices seemed promising at first:

ESHOST=http://localhost:9200
CURLOPT="-s -u admin:admin"

curl --location --request POST '${ESHOST}/_opendistro/_sql' \
$CURLOPT \
--header 'Content-Type: application/json' \
--data-raw '{"query" : "SELECT a.iCCID iCCID, b.iCCID iCCID_b FROM test-a a LEFT JOIN test-b b ON a.iCCID = b.iCCID"}'

Response from LEFT JOIN looks correct:

{
    "schema": [
        {
            "name": "a.iCCID",
            "alias": "iCCID",
            "type": "text"
        },
        {
            "name": "b.iCCID",
            "alias": "iCCID_b",
            "type": "text"
        }
    ],
    "total": 3,
    "datarows": [
        [
            "1",
            "1"
        ],
        [
            "2",
            "2"
        ],
        [
            "3",
            null
        ]
    ],
    "size": 3,
    "status": 200
}

So I go on trying a LEFT JOIN, this time adding a WHERE condition:

ESHOST=http://localhost:9200
CURLOPT="-s -u admin:admin"

curl --location --request POST '${ESHOST}/_opendistro/_sql' \
$CURLOPT \
--header 'Content-Type: application/json' \
--data-raw '{"query" : "SELECT a.iCCID iCCID, b.iCCID iCCID_b FROM test-a a LEFT JOIN test-b b ON a.iCCID = b.iCCID where b.iCCID is null"}'

Response from LEFT JOIN with WHERE condition:

{
    "schema": [
        {
            "name": "a.iCCID",
            "alias": "iCCID",
            "type": "text"
        },
        {
            "name": "b.iCCID",
            "alias": "iCCID_b",
            "type": "text"
        }
    ],
    "total": 3,
    "datarows": [
        [
            "2",
            null
        ],
        [
            "3",
            null
        ],
        [
            "1",
            null
        ]
    ],
    "size": 3,
    "status": 200
}

Here, i was expecting the query to return just one record (record 3), but received an unexpected result.

I tried other approaches using SQL, such as subquery, nothing worked for me. So I am posting here in hope of advise, or just in case somebody has a better way of approaching this.

@j.f Thanks for reporting the issue! Unfortunately, post-filter/aggregate after join operation is not supported yet. We’re evaluating and will be adding more comprehensive JOIN support in our new query engine in future. Thanks!

@daichen Are you sure? I have a similar problem and found this in the documentation where it explicitly says that one can use the SELECT FROM LEFT JOIN WHERE clause within certain contraints:

[Complex Queries - Open Distro Documentation] (Complex Queries - Open Distro Documentation)

An issue for your reference: LEFT JOIN with WHERE doesn't filter the result as expected · Issue #124 · opendistro-for-elasticsearch/sql · GitHub

1 Like

@daichen Thank you for the reply! So is there no way of postfiltering? After a Join-clause?

Yes, this is what we want to improve in the SQL Engine V2: GitHub - opendistro-for-elasticsearch/sql: 🔍 Open Distro SQL Plugin. The JOIN in new engine should be able to support multi-join, post-filter/agg/sort etc. Please stay tuned to our GitHub repo. Thanks!