How to include "missing" values in aggregations

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

  • OpenSearch 2.11.1
  • opensearch-py 2.4.2

Describe the issue:

This should be pretty simple to answer I think but I am not doing well at figuring out the documentation at the moment so I thought I’d check for help here.

Essentially what I’m trying to do is the same thing that’s being done in Dashboards when you create a table with multiple fields and turn on the “show missing” option in the UI. For example, in one particular table I have “show missing” for the “Protocol Version” field and the “show missing” value set to ‘-’ and when run, the CSV I get looks like this:

"Application Protocol","Protocol Version",Count
tls,"-",2

In dashboards I can have it show me the request JSON:

{
  "aggs": {
    "2": {
      "terms": {
        "field": "network.protocol",
        "order": {
          "_count": "desc"
        },
        "size": 100
      },
      "aggs": {
        "3": {
          "terms": {
            "field": "network.protocol_version",
            "order": {
              "_count": "desc"
            },
            "missing": "__missing__",
            "size": 100
          }
        }
      }
    }
  },
...

In my code (see the link to GitHub here) essentially I am looping over the lists of fields and chaining together .bucket calls:

    last_bucket = s.aggs
    aggCount = 0
    for fname in get_iterable(fieldnames):
        aggCount += 1
        last_bucket = last_bucket.bucket(
            f"values_{aggCount}",
            "terms",
            field=fname,
            size=bucket_limit,
        )

I found the Missing aggregations document but I am not figuring out how to translate this to the python library. I’ve tried adding stuff to my call to .bucket like this:

        last_bucket = last_bucket.bucket(
            f"values_{aggCount}",
            "terms",
            field=fname,
            size=bucket_limit,
            missing="__missing__",
            min_doc_count=0,
        )

but that does not seem to change the output I get. I still don’t get the missing buckets.

Can somebody help me out? I want to return a bucket with the “missing” value the same way those tables in Dashboards are doing it, regardless of the level of the aggregation (ie., first column, second column, whatever).

Thanks.

Or maybe there’s an easier way to do this altogether? I read something about a “faceted queries” abstraction but it didn’t make a ton of sense to me as to whether that’s what I was trying to do.

Ugh, it’s been a long day. I may have some problems with the persistence of the container I’m testing this in and not actually seeing the results of my code as I’m trying things. After I fully wiped and restarted my whole system I believe I’m getting what I’m expecting now, based on the addition of the missing argument to bucket.

I’m going to do some more testing, but I think it is working. Sorry about the noise, maybe this will be useful to someone else later though