Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): Opensearch 2.15
Describe the issue:
Hi all,
I have an OpenSearch index named vector-index with the following mapping. This index stores document chunks from a parent index to enable semantic search:
{
"mappings": {
"properties": {
"orig_id": { // Reference to parent document’s _id
"type": "keyword"
},
"chunk_number": {
"type": "integer"
},
"text": {
"type": "text"
},
"embedding": {
"type": "knn_vector",
...
},
...
}
}
}
This index works fine for semantic search, but the UI needs to display links to full parent documents instead of chunks. Directly returning chunks causes duplicate parent documents. We came up with the following approach.
- Perform semantic search on vector-index, but use the collapse field to deduplicate the results by orig_id.
- Using the orig_ids retrieved from the query above, query the parent index to get the associated documents.
My query with collapsing for vector-index is below (simplified).
{
"from": 0,
"size":10,
"query": ...,
"sort": [
{
"_score": "desc"
},
{
"_id": "asc"
}
],
"collapse": {
"field": "orig_id"
}
}
This works initially, but we are encountering pagination issues, perhaps due to the approximate nature of KNN search. Now we are trying to implement infinite scrolling with search_after, but we encountered an error. Here is the original query again, but with search_after.
{
"size":10,
"query": ...,
"sort": [
{
"_score": "desc"
},
{
"_id": "asc"
}
],
"collapse": {
"field": "orig_id"
},
"search_after":["orig_id1"]
}
When I execute this query, I got the following error:
cannot use
collapse
in conjunction with `search_after
My questions are:
- why can’t collapse and search_after coexist? My guess is when the results are retrieved, search_after will start from the passed value, then collapse would deduplicate the overall result by orig_id. Or I am missing something here?
- if collapse can’t work with search_after, is there a way to achieve what we want in opensearch? We need infinite scroll with deduplicated parent documents.
Thanks!
Configuration:
Relevant Logs or Screenshots: