OpenSearch unable to use pagination with nested item(s)

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Unsure - AWS Opensearch Serverless, couldn’t find where to check the version apologies!

Describe the issue:

I have a document index which is called SalesOrder. It indexes - unsurprisingly - sales orders. Each sales order has one or more lines. These lines are indexed as nested items.

{
  "SalesOrder": {
    "Property1": "Value1",
    "Property2": "Value2",
    "LineItems": [
      {
        "LineItemProperty1": "Value1",
        "LineItemProperty2": "Value2"
      },
      {
        "LineItemProperty1": "Value3",
        "LineItemProperty2": "Value4"
      }
    ]
  }
}

To paginate, I’m using the from and size method from here: Paginate results - OpenSearch Documentation

It seems like the paginate in OpenSearch is limited to the root document which causes an issue when we search for sales order lines, because we’re doing a nested query and joining together outer documents (the sales orders) and inner items (the lines with backorder quantities on them). So each record in our response is comprised of:

"includes": [
  "customerNumber",
  "customerName",
  "erpOrderReference",
  "orderReference",
  "created",
  "orderSource",
  "orderType",
  "userCreated",
  "lineItems.sku",
  "lineItems.articleDescription",
  "lineItems.lineId",
  "lineItems.orderQuantity",
  "lineItems.backorderQuantity",
  "lineItems.expiryDate"
]

In a normal pagination of sales orders (no lineItems), if I say return me the first 2 records, I would get:

  • Sales Order 1:
  • Sales Order 2:

So far, so good. But when it comes to joining together back order lines and the orders they belong to, the pagination still only respects the outer document. What does this result in? Let’s say you again ask for only the first 2 records for the backorder query. You could get back something like:

  • Sales Order 1, Line 1
  • Sales Order 1, Line 2
  • Sales Order 1, Line 3
  • Sales Order 1, Line 4
  • Sales Order 2, Line 5

So you’re getting back 5 records (not what was asked for), but the pagination is respecting the document level by returning only orders 1 and 2 (2 records). How can I work around this? So that the pagination from/size parameters respect the nested items? So I only get back (in this example)

  • Sales Order 1, Line 1
  • Sales Order 1, Line 2

Configuration:

Relevant Logs or Screenshots:

1 Like

Hello buddy,

Check the blow code snippet for a solution based on flattening the data structure by denormalizing it. In this example, each line item is stored as a separate document with a reference to the parent sales order.

{
“SalesOrder”: {
“Property1”: “Value1”,
“Property2”: “Value2”
}
}

{
“LineItem”: {
“SalesOrderId”: “123”, // Reference to the parent sales order
“LineItemProperty1”: “Value1”,
“LineItemProperty2”: “Value2”
}
}

GET /your-index/_search
{
“query”: {
“match”: {
“LineItem.LineItemProperty1”: “Value1”
}
},
“size”: 2, // Number of records to return per page
“from”: 0 // Offset for pagination
}

this solution involves denormalizing the data, which might increase storage requirements. Additionally, it simplifies queries but may introduce some redundancy in the data. Choose this approach if it aligns with your specific requirements and trade-offs.

Achieve pagination with nested items directly without denormalizing the data, you can use the inner_hits feature in OpenSearch to get paginated results for nested documents. Here’s an example query:

GET /your-index/_search
{
“query”: {
“nested”: {
“path”: “SalesOrder.LineItems”,
“query”: {
“match_all”: {} // You can replace this with your specific query
},
“inner_hits”: {
“size”: 2, // Number of line items to return per page
“from”: 0 // Offset for pagination
}
}
}
}

Thanks

1 Like

Thanks so much @johnsonit I’m going to try one of those approaches and see how I go. If I run into an opensearch brick wall, I might be back!

1 Like