Is PIT with search slicing is the best method to paginate?

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Open Search 2.6

Describe the issue:
I have more than 1 million hits,
So to paginate them to jump from a page to a non-consecutive page, I used search slicing with PIT which has the slice ID and a max value which is maximum slices(pages) My queries are:
To calculate the number of pages, I need to get the total hits captured during the PIT id creation. This total hit count will be helpful for me to decide the max value in search slicing.
Consider the below example I have total of 49,000 hits so with PIT and search slicing , I mentioned the query as below,
REQUEST:
GET /_search
{

“slice”: {
“id”: 4,
“max”: 5
},
“pit”: {
“id”: “w97JyWEFoNzJOWUJGxdHN1QQAWMWN3TVhsby1RdUtHLUx”
}}

RESPONSE:
{
“pit_id”: "w97JyWEFoNzJOWUJGxdHN1QQAWMWN3TVhsby1RdUtHLUx,
“took”: 4,
“timed_out”: false,
“_shards”: {
“total”: 5,
“successful”: 5,
“skipped”: 0,
“failed”: 0
},
“hits”: {
“total”: {
“value”: 9587,
“relation”: “eq”
},
“max_score”: 1,
“hits”: [
{
“_index”: “unitcase-2023.08.09”,
“_id”: “mJqZ3IkBzwDCUDA2eXbl”,
“_score”: 1,
“_source”: {
“@timestamp”: “2023-08-09T23:19:50.794Z”,
“log”: “Reboot failed”
“service_name”: “Hostdb”
}
}
Above response shows value as 9587, but I got only 10 hits do I need to specify the size field in search slicing.
Is PIT with search slicing is the best method to paginate??

Hi Raji,

Yes, you can specify size to control the page size of the page you’re getting. You can also get the next page if you use search_after: Point in Time - OpenSearch documentation

Whether PIT with slices is the best way to paginate or not - that would depend on exactly what you need to achieve.

Hi Radu,

Thanks for the clarification in search slicing, the size value is not working please look into the below query,
My total hit count is: 37,836
Now when I use search slice with PIT for a max of 100 page, the slicing method automatically splits the hits to the 100 page and the size parameter we provide in the query doesn’t work,

Query:
GET /_search
{
“slice”: {
“id”: 1,
“max”: 100
},
“size”:5000,
“pit”: {
“id”: “w979QAkScxxxxxxxxxxxxxxxxxxxxxd181VjZhUQAA”
}
}

RESPONSE:
“hits”: {
“total”: {
“value”: 3435,
“relation”: “eq”

And for id:2 we have value:3256 likewise the hits value keeps changes when we slice and the size parameter is not considered.

Hi Raji,

If you have 100 slices, then you’ll have an average of 3.7K results per slice. If your page says 5000, you won’t get the full page, just the share of 3.7K. I think that’s an OK behavior because “size” effectively means “get top N documents”, but if there are M<N documents available, you’ll get M.

1 Like