Aggregation help

Hello,

I’m new to opensearch and struggling to create a query and dashboard for my specific scenario. I have a dataset of orders with the following structure:

order_id order_status timestamp
1 started 01.01.2023
1 in_progress 02.01.2023
2 started 02.01.2023
1 complete 03.01.2023
2 in_progress 04.01.2023

What I need is the latest status of each order along with the count of those statuses. In SQL terms, the query would look like this:

sqlCopy code

WITH ranked AS (
    SELECT 
        order_id, 
        order_status, 
        ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY timestamp DESC) AS r
    FROM orders
)
SELECT order_status, COUNT(*) 
FROM ranked
WHERE r = 1
GROUP BY order_status;

I appreciate any guidance on how to achieve this in opensearch. Thank you!"

any help appreciated

If I get your requirement right, it would be a terms aggregation over the ordere_status field: Terms - OpenSearch documentation

This should get you the top N (default is 10, configurable via size) unique order_status values and their counts.

But I need to aggregate the latest status for each order first.

so lets say if
order1 has status started->progress-completed,
order2 has status started → progress,
order3 has status. started → progress

I get
order1 completed,
order2 progress,
order3 progress
as last statues. Then I need another agregation to count all.

completed 1,
progress 2

Aha, so what I said above covers the last bit. In order to get the orders and their last statuses, you’ll probably need a terms aggregation on order_id, then do a top hits aggregation where you sort by timestamp and show the status.

You can do both aggregations in one call. Here’s an (untested) example:

{
  "aggs": {
    "status_breakdown": {
      "terms": {
        "field": "order_status",
        "size": 5
      }
    },
    "top_orders": {
      "terms": {
        "field": "order_id",
        "size": 100
      },
      "aggs": {
        "last_status": {
          "top_hits": {
            "sort": [
              {
                "timestamp": {
                  "order": "desc"
                }
              }
            ],
            "_source": {
              "includes": [ "order_status" ]
            },
            "size": 1
          }
        }
      }
    }
  }
}

This should get you 100 orders sorted by the number of documents you have per order, which may not be what you want. If you need to get the last 100 orders you’ll need another sub-aggregation to get the most recent date and sort your order buckets based on that. You have an example here: Top hits aggregation | Elasticsearch Guide [8.11] | Elastic

Now that I say it, collapse is another option, which should be faster: Collapse search results | Elasticsearch Guide [7.10] | Elastic

Thanks for the explanations. but i still couldn’t find a correct query.
So,

POST orders/_doc
{
  "order_id": "1",
  "status": "started",
  "index": 1
}

POST orders/_doc
{
  "order_id": "1",
  "status": "progress",
  "index": 2
}

POST orders/_doc
{
  "order_id": "1",
  "status": "completed",
  "index": 3
}

POST orders/_doc
{
  "order_id": "2",
  "status": "started",
  "index": 1
}

POST orders/_doc
{
  "order_id": "2",
  "status": "progress",
  "index": 2
}

POST orders/_doc
{
  "order_id": "3",
  "status": "started",
  "index": 1
}

POST orders/_doc
{
  "order_id": "3",
  "status": "progress",
  "index": 2
}

here i used index instead of timestamp for simplicity.

if i run your query

GET /orders/_search
{
  "aggs": {
    "status_breakdown": {
      "terms": {
        "field": "status.keyword",
        "size": 5
      }
    },
    "top_orders": {
      "terms": {
        "field": "order_id.keyword",
        "size": 100
      },
      "aggs": {
        "last_status": {
          "top_hits": {
            "sort": [
              {
                "index": {
                  "order": "desc"
                }
              }
            ],
            "_source": {
              "includes": [ "status" ]
            },
            "size": 1
          }
        }
      }
    }
  }
}

My response is

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 7,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "orders",
        "_id": "PDaQuosBg8S_kochBrsW",
        "_score": 1,
        "_source": {
          "order_id": "1",
          "status": "started",
          "index": 1
        }
      },
      {
        "_index": "orders",
        "_id": "PjaQuosBg8S_kochBrte",
        "_score": 1,
        "_source": {
          "order_id": "1",
          "status": "progress",
          "index": 2
        }
      },
      {
        "_index": "orders",
        "_id": "PzaQuosBg8S_kochBrt-",
        "_score": 1,
        "_source": {
          "order_id": "1",
          "status": "completed",
          "index": 3
        }
      },
      {
        "_index": "orders",
        "_id": "QDaQuosBg8S_kochBru-",
        "_score": 1,
        "_source": {
          "order_id": "2",
          "status": "started",
          "index": 1
        }
      },
      {
        "_index": "orders",
        "_id": "QTaQuosBg8S_kochBrvb",
        "_score": 1,
        "_source": {
          "order_id": "2",
          "status": "progress",
          "index": 2
        }
      },
      {
        "_index": "orders",
        "_id": "QjaQuosBg8S_kochBrv3",
        "_score": 1,
        "_source": {
          "order_id": "3",
          "status": "started",
          "index": 1
        }
      },
      {
        "_index": "orders",
        "_id": "QzaQuosBg8S_kochB7sX",
        "_score": 1,
        "_source": {
          "order_id": "3",
          "status": "progress",
          "index": 2
        }
      }
    ]
  },
  "aggregations": {
    "top_orders": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "1",
          "doc_count": 3,
          "last_status": {
            "hits": {
              "total": {
                "value": 3,
                "relation": "eq"
              },
              "max_score": null,
              "hits": [
                {
                  "_index": "orders",
                  "_id": "PzaQuosBg8S_kochBrt-",
                  "_score": null,
                  "_source": {
                    "status": "completed"
                  },
                  "sort": [
                    3
                  ]
                }
              ]
            }
          }
        },
        {
          "key": "2",
          "doc_count": 2,
          "last_status": {
            "hits": {
              "total": {
                "value": 2,
                "relation": "eq"
              },
              "max_score": null,
              "hits": [
                {
                  "_index": "orders",
                  "_id": "QTaQuosBg8S_kochBrvb",
                  "_score": null,
                  "_source": {
                    "status": "progress"
                  },
                  "sort": [
                    2
                  ]
                }
              ]
            }
          }
        },
        {
          "key": "3",
          "doc_count": 2,
          "last_status": {
            "hits": {
              "total": {
                "value": 2,
                "relation": "eq"
              },
              "max_score": null,
              "hits": [
                {
                  "_index": "orders",
                  "_id": "QzaQuosBg8S_kochB7sX",
                  "_score": null,
                  "_source": {
                    "status": "progress"
                  },
                  "sort": [
                    2
                  ]
                }
              ]
            }
          }
        }
      ]
    },
    "status_breakdown": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "progress",
          "doc_count": 3
        },
        {
          "key": "started",
          "doc_count": 3
        },
        {
          "key": "completed",
          "doc_count": 1
        }
      ]
    }
  }
}

Status breakdown buckets are not correct.
I need completed 1, progress 2

Can you help me correct the query?

I can the latest status for each order with query below, but i don’t know how to get counts of each status after that.

GET orders/_search
{
  "size": 0,
   "aggs": {
     "order_group": {
        "terms": {
          "field": "order_id.keyword"
        },
        "aggs": {
          "latest_status": {
            "top_hits": {
              "size": 1,
              "sort": [
                    {
                       "index" : {
                            "order": "desc"
                       }
                    }
                ]
            }
          }
        }

     }
   }
}

Hmmm… I don’t know of a good solution to do that :frowning:

Because I think what you need to do is to get all those top_hits. Which implies that the order_group aggregation has to be of a huge size, enough to get all the groups. If you have many groups, you’ll have to break down the request into multiple requests with partitions. But then you’d have to count these orders with the application, I don’t know of a built-in way to do that.

Unless there’s a completely different approach that I’m not seeing…