Multiple search criteria across different documents sharing a common key

Versions: (relevant - OpenSearch/Dashboard/Server OS/Browser): OpenSearch 3.7.0.

Describe the issue: I have multiple documents which share the same «business ID». Using multiple search criteria, which can be satisfied by distinct documents, how can I find all the matching «business ID»s?

Given this index mapping:

PUT myindex
{
  "mappings": {
    "properties": {
      "business_id": {
        "type": "keyword"
      },
      "color": {
        "type": "keyword"
      }
    }
  }
}

and the following data:

POST myindex/_doc
{
  "business_id": "123",
  "color": "red"
}

POST myindex/_doc
{
  "business_id": "123",
  "color": "cerise"
}

POST myindex/_doc
{
  "business_id": "456",
  "color": "cerulean"
}

POST myindex/_doc
{
  "business_id": "789",
  "color": "red"
}

I tried this query:

GET myindex/_search
{
  "query": {
    
    "bool": {
      "should": [
        {
          "term": {
            "color": {
              "value": "red"
            }
          }
        },
        {
          "term": {
            "color": {
              "value": "cerise"
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "size": 0, 
  "aggs": {
    "business_ids": {
      "terms": {
        "field": "business_id",
        "size": 10,
        "min_doc_count": 2
      }
    }
  }
}

It works, but:

  1. This feels overly complicated. Is there a more concise way to express the same query?
  2. Should I be worried about the aggregation performance?

Configuration: OpenSearch and OpenSearch Dashboards started locally as Docker containers with docker run -d --name opensearch -p 9200:9200 -e 'DISABLE_SECURITY_PLUGIN=true' -e 'discovery.type=single-node' opensearchproject/opensearch and docker run -d --name dashboards -p 5601:5601 -e 'DISABLE_SECURITY_DASHBOARDS_PLUGIN=true' -e 'OPENSEARCH_HOSTS=["http://<my local IP address>:9200"]' opensearchproject/opensearch-dashboards, respectively.

Hi @danilopiazza,

Thanks for providing a clear example.

Yes. Since both conditions are exact matches on the same keyword field, you can replace the two term clauses with a single terms query:

GET myindex/_search
{
  "size": 0,
  "query": {
    "terms": {
      "color": ["red", "cerise"]
    }
  },
  "aggs": {
    "business_ids": {
      "terms": {
        "field": "business_id",
        "size": 10,
        "min_doc_count": 2
      }
    }
  }
}

For the sample shown, aggregation performance should not be a concern. At a larger scale, the main factors are the number of matching documents, the number of unique business_id values, and the number of buckets requested.

One thing to note: this approach assumes that each color appears at most once for a given business_id. If duplicate color values are possible, min_doc_count: 2 only guarantees that two documents matched either red or cerise; it does not guarantee that both colors are present.

Also, "size": 10 returns at most 10 business ID buckets, not all matching business IDs. If you need to retrieve every matching business ID from a large result set, consider using a paginated composite aggregation rather than setting a very large terms.size.