[DSL] Get more info for each aggregated bucket

A question from @cnik, Get more info for each aggregated bucket


Hi, I am trying to create an aggregation by using the terms field. This creates buckets based on the provided field but I want to get more data for each bucket.

To be more specific I have user data, where I have a user_id as well as a username, and other data that are unique to the user.

I am trying to aggregate data over a specific period so I get:

  1. the total number of users that did a particular action over that time-span
  2. a histogram of how many users did the action over that time-span per day
  3. the number of times each user did the action over that time-span per day

In (3.) I am creating a bucket for each user based on the user_id (by using the termsfield) but I would also like to get the additional user info, like the username, etc.

I cannot find any way to do that.

To check the number of users and get a list of users who did a specific action over the time-span I have the following query:

{
    "size": 0,
    "track_total_hits": true,
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "action_type_hash.keyword": "10637ef194438230be3d8e6158bc83712573b8db07809a5c956d3fb301ee233f"
                    }
                }
            ],
            "filter": [
                {
                    "range": {
                        "active": {
                            "gte": 1655892938,
                            "lt": 1656414938
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
        "unique_users": {
            "cardinality": {
                "field": "user.id"
            }
        },
        "hist": {
            "histogram": {
                "field": "action_time",
                "min_doc_count": 0,
                "interval": 3600
            },
            "aggs": {,
                "users": {
                    "terms": {
                        "field": "user.id",
                        "size": 50
                  }
                }
            }
        }
    }
}

But this returns the buckets with the user_id only and the count.

@joshli - Were you able to resolve this, as I’m facing a similar problem and wanted to check if you got any way out

This questions is actually from @cnik, i copied it here. Have you looked into using painless script for this or adding a nested terms under user.id to return username?

I’m not working on DSL and my suggestions might not work (would be helpful if someone from opensearch core can help here), but if you have some sample mapping and docs, I can give it a try