How to include "missing" values in aggregations

tlacuache · January 24, 2024, 11:16pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

OpenSearch 2.11.1
opensearch-py 2.4.2

Describe the issue:

This should be pretty simple to answer I think but I am not doing well at figuring out the documentation at the moment so I thought I’d check for help here.

Essentially what I’m trying to do is the same thing that’s being done in Dashboards when you create a table with multiple fields and turn on the “show missing” option in the UI. For example, in one particular table I have “show missing” for the “Protocol Version” field and the “show missing” value set to ‘-’ and when run, the CSV I get looks like this:

"Application Protocol","Protocol Version",Count
tls,"-",2

In dashboards I can have it show me the request JSON:

{
  "aggs": {
    "2": {
      "terms": {
        "field": "network.protocol",
        "order": {
          "_count": "desc"
        },
        "size": 100
      },
      "aggs": {
        "3": {
          "terms": {
            "field": "network.protocol_version",
            "order": {
              "_count": "desc"
            },
            "missing": "__missing__",
            "size": 100
          }
        }
      }
    }
  },
...

In my code (see the link to GitHub here) essentially I am looping over the lists of fields and chaining together .bucket calls:

    last_bucket = s.aggs
    aggCount = 0
    for fname in get_iterable(fieldnames):
        aggCount += 1
        last_bucket = last_bucket.bucket(
            f"values_{aggCount}",
            "terms",
            field=fname,
            size=bucket_limit,
        )

I found the Missing aggregations document but I am not figuring out how to translate this to the python library. I’ve tried adding stuff to my call to .bucket like this:

        last_bucket = last_bucket.bucket(
            f"values_{aggCount}",
            "terms",
            field=fname,
            size=bucket_limit,
            missing="__missing__",
            min_doc_count=0,
        )

but that does not seem to change the output I get. I still don’t get the missing buckets.

Can somebody help me out? I want to return a bucket with the “missing” value the same way those tables in Dashboards are doing it, regardless of the level of the aggregation (ie., first column, second column, whatever).

Thanks.

tlacuache · January 24, 2024, 11:24pm

Or maybe there’s an easier way to do this altogether? I read something about a “faceted queries” abstraction but it didn’t make a ton of sense to me as to whether that’s what I was trying to do.

tlacuache · January 24, 2024, 11:35pm

Ugh, it’s been a long day. I may have some problems with the persistence of the container I’m testing this in and not actually seeing the results of my code as I’m trying things. After I fully wiped and restarted my whole system I believe I’m getting what I’m expecting now, based on the addition of the missing argument to bucket.

I’m going to do some more testing, but I think it is working. Sorry about the noise, maybe this will be useful to someone else later though

system · March 24, 2024, 11:36pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Missing bucket not shown for rollup index aggregations OpenSearch	0	19	April 9, 2025
Calculate the percentage of the total documents across all buckets OpenSearch troubleshoot	0	74	November 21, 2024
Transform job aggregations for missing field behavior OpenSearch	1	431	December 4, 2024
Aggregation help OpenSearch	7	753	November 13, 2023
Unexpected behavior when aggregating using sampler OpenSearch troubleshoot	0	23	October 9, 2024

How to include "missing" values in aggregations

Related topics