Unique item counting issue in index

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Running docker version.
“version” : {
“distribution” : “opensearch”,
“number” : “3.1.0”,
“build_type” : “tar”,
“build_hash” : “8ff7c6ee924a49f0f59f80a6e1c73073c8904214”,
“build_date” : “2025-06-21T08:05:43.345081313Z”,
“build_snapshot” : false,
“lucene_version” : “10.2.1”,
“minimum_wire_compatibility_version” : “2.19.0”,
“minimum_index_compatibility_version” : “2.0.0”

Describe the issue:

When executing this:

POST /_plugins/_sql
{
“query”:
“SELECT count(Id) FROM table1”
}

Opensearch returns:

{
“schema”: [
{
“name”: “count(Id)”,
“type”: “integer”
}
],
“datarows”: [
[
20000
]
],
“total”: 1,
“size”: 1,
“status”: 200
}

However, if you do “SELECT count(distinct Id) FROM table1”, it returns:

{
“schema”: [
{
“name”: “count(distinct Id)”,
“type”: “integer”
}
],
“datarows”: [
[
20015
]
],
“total”: 1,
“size”: 1,
“status”: 200
}

So why counting distinct returns 20015 which is > total records# 20000 ?

p.s. The Id (numeric) column has no issue, neat and consistent without any special characters.

@summerist.l OpenSearch SQL uses the cardinality aggregation under the hood, which comes with an estimator which has a small relative error. More details can be found here.

You can _explain api to see the translation, see below:

POST _plugins/_sql/_explain
{
  "query": "SELECT COUNT(DISTINCT Id) FROM table1"
}

You can also increase precision using precision_threshold, see following example:

POST table1/_search
{
  "size": 0,
  "aggs": {
    "uniq_ids": {
      "cardinality": {
        "field": "Id",
        "precision_threshold": 40000
      }
    }
  }
}

Hope this helps

1 Like

@Anthony It is solved. Thank you so much!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.