Apply match-like query on the result of aggregations

Good day all,

Given the below index:

PUT someindex/_doc/doc1
{
“name”:“person1”,
“roles”:[“engineer”,“consultant”]
}

How can get all roles that start with “eng”?

Some like the below SQL:
select distinct roles from someindex where roles like ‘eng%’

Is there a way in opensearch to get the above in DSL?

I tried match_phrase_prefix along with terms aggregation over the roles field but it returned both engineer and consultant. I even tried the SQL plugin and got the same result while I should logically be getting only engineer.

I understand that OpenSearch operates on a document level but I was wondering if there is a way to to apply a match kind of query on the aggregated result.

I appreciate your help.

Thanks

I think you want a wildcard search, from there you can build your aggregation:

GET someindex/_search
{
  "query": {
    "wildcard": {
      "roles": {
        "value": "eng*"
      }
    }
  }
}

You want to do an terms aggregation on roles, and filter by role name, correct?

This can be done by filtering on values. See Terms aggregation | Elasticsearch Guide [8.4] | Elastic

POST someindex/_search
{
  "size": 0, 
  "aggs": {
    "engineeringRoles": {
      "terms": {
        "field": "roles.keyword",
        "include": "eng.*", 
        "size": 10
      }
    }
  }
}

This will return

"buckets" : [
        {
          "key" : "engineer",
          "doc_count" : 1
        }
      ]
2 Likes

Perfect. This is exactly what I was looking for.

The question is, would the SQL plugin return this expected? I tried in OpenDistro 1.9 but it didn’t. I am not sure about the latest and greatest of OpenSearch.

@searchymcsearchface, should this be reported as a bug or a feature in case it doesn’t?

Regards

@asfoorial Indeed it doesn’t work properly on OpenSearch. I get back even values that don’t match eng*

{
  "from": 0,
  "size": 0,
  "timeout": "1m",
  "query": {
    "wildcard": {
      "roles.keyword": {
        "wildcard": "eng*",
        "boost": 1
      }
    }
  },
  "sort": [
    {
      "_doc": {
        "order": "asc"
      }
    }
  ],
  "aggregations": {
    "composite_buckets": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "roles": {
              "terms": {
                "field": "roles.keyword",
                "missing_bucket": true,
                "order": "asc"
              }
            }
          }
        ]
      }
    }
  }
}

I would report this as a bug in OpenSearch/sql since it’s returning the wrong value.