Versions:
- OpenSearch 3.3.2
- Opensearch Dashboard 3.3.0
(Windows environment)
Overall situation:
I have a knn index and I need to do a hybrid search collapsing on a keyword field while also retrieving the inner hits. Recently there have been news regarding this matter and starting with the 3.0 release it was officially documented as a supported feature for the hybrid queries (link to the official documentation: Using inner hits in hybrid queries - OpenSearch Documentation).
Issue:
When I try using the collapse with inner hits feature in combination with the hybrid query I get the error:
{
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "failed to expand hits",
"phase": "expand",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "innerhits_expansion_error_index",
"node": "fMKRQ9WpTxWJENS0O5hI6w",
"reason": {
"type": "e_o_f_exception",
"reason": "read past EOF (pos=2147483647): MemorySegmentIndexInput(path=\"C:\\ZWeb\\OpenSearch\\opensearch\\data\\nodes\\0\\indices\\X3dJFJyZTfie9IdrxqrSOg\\0\\index\\_0.cfs\") [slice=_0.nvd] [slice=randomaccess]"
}
}
],
"caused_by": {
"type": "e_o_f_exception",
"reason": "read past EOF (pos=2147483647): MemorySegmentIndexInput(path=\"<my_path>\") [slice=_0.nvd] [slice=randomaccess]",
"caused_by": {
"type": "e_o_f_exception",
"reason": "read past EOF (pos=2147483647): MemorySegmentIndexInput(path=\"<my_path>\") [slice=_0.nvd] [slice=randomaccess]"
}
}
}
},
"status": 500
}
The error only comes up when I try to also retrieve the inner hits. It seems that the expansion of the inner hits is failing for some reason.
Instructions to replicate the issue:
#INDEX SCHEMA
PUT /innerhits_expansion_error_index
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"author": {
"store": true,
"type": "keyword"
},
"attachmentData": {
"store": true,
"term_vector": "yes",
"type": "text"
},
"chunk_embedding": {
"dimension": 768,
"type": "knn_vector"
},
"description": {
"store": true,
"term_vector": "yes",
"type": "text"
}
}
}
}
#INGEST PIPELINE
PUT _ingest/pipeline/innerhits_expansion_error_index_ingest_pipeline
{
"description": "Pipeline di ingestion per core: innerhits_expansion_error_index",
"processors": [
{
"text_embedding": {
"field_map": {
"attachmentData": "chunk_embedding"
},
"model_id": "wxPEApsBh9UO9dEM5IFh"
}
}
]
}
#BULK INGESTION
POST /innerhits_expansion_error_index/_bulk?pipeline=innerhits_expansion_error_index_ingest_pipeline
{ "index": {} }
{ "attachmentData": """Wuthering Heights, Emily Brontë's 1847 novel, is a dark, passionate tale set on the bleak Yorkshire moors, exploring obsessive love, revenge, and social class through the destructive relationship of Catherine Earnshaw and Heathcliff, framed by a narrative where outsider Mr. Lockwood hears the tragic story from housekeeper Nelly Dean, revealing a world of fierce emotions and supernatural undertones.""", "description":"""Wuthering Heights""", "author":"""Emily Brontë""" }
{ "index": {} }
{ "attachmentData": """Emily Brontë's "The Night is Darkening Round Me" (also known as "Spellbound") is a powerful poem about being trapped by an intense, perhaps loving, force amidst a fierce, darkening natural landscape, using vivid imagery of wild winds, snow, and endless wastes to convey a feeling of being bound by a "tyrant spell" that, despite its gloom, the speaker welcomes, refusing to leave due to an internal resolve or connection stronger than external dread. The poem sets a scene of impending storm and desolation, but the speaker's repeated insistence, "I will not, cannot go," reveals a chosen captivity, highlighting themes of nature, internal feeling, and a powerful, binding emotion. """, "description":"""The Night is Darkening Round Me""", "author":"""Emily Brontë"""}
{ "index": {} }
{ "attachmentData": """The Magic Mountain (1924) by Thomas Mann is a monumental novel about young German engineer Hans Castorp, who visits his cousin at a tuberculosis sanatorium in the Swiss Alps, intending a short stay but getting drawn into the isolated, timeless world of illness, philosophy, and pre-WWI European culture for seven years, exploring life, death, love (with Clavdia Cauchat), and politics before being pulled back to the "flatland" and the outbreak of war. It's a philosophical bildungsroman (coming-of-age story) using the microcosm of the Berghof sanatorium to reflect the macrocosm of a world on the brink of chaos, contrasting health and sickness, spirit and flesh, and intellect versus instinct. """, "description":"""The Magic Mountain""", "author":"""Thomas Mann"""}
{ "index": {} }
{ "attachmentData": """The Unbearable Lightness of Being's introduction sets up the novel's core philosophical dilemma: the conflict between "lightness" (meaninglessness, freedom from consequence) and "weight" (purpose, responsibility, eternal return), using the backdrop of Prague during the 1968 Soviet invasion to explore these ideas through the interwoven lives of surgeon Tomas, his wife Tereza, his mistress Sabina, and her lover Franz, blending love, politics, and existential questions. It immediately contrasts Nietzsche's eternal return (heavy) with Parmenides' concept of single-occurrence life (light), suggesting life's fleeting moments make choices weightless, a tension central to the characters' struggles with love, fidelity, and freedom. """, "description":"""The Unbearable Lightness of Being""", "author":"""Milan Kundera"""}
#CHECK RECORDS
GET innerhits_expansion_error_index/_search
{
"query": {
"match_all": {}
}
}
#SEARCH PIPELINE
PUT /_search/pipeline/innerhits_expansion_error_index_search_pipeline
{
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"combination": {
"technique": "arithmetic_mean",
"parameters": {
"weights": [
0.5,
0.5
]
}
}
}
}
],
"request_processors": [
{
"neural_query_enricher": {
"default_model_id": "wxPEApsBh9UO9dEM5IFh"
}
}
]
}
#SEARCH WITH COLLAPSE (NO INNERHITS) - no error
GET innerhits_expansion_error_index/_search?search_pipeline=innerhits_expansion_error_index_search_pipeline
{
"size": 50,
"query": {
"hybrid": {
"queries": [
{
"query_string": {
"query": "storm~1",
"fields": [
"description^2.0",
"attachmentData"
]
}
},
{
"neural": {
"chunk_embedding": {
"query_text": "storm"
}
}
}
]
}
},
"_source": {
"excludes": "chunk_embedding"
},
"collapse": {
"field": "author"
}
}
#SEARCH WITH COLLAPSE (WITH INNERHITS) - ERROR
GET innerhits_expansion_error_index/_search?search_pipeline=innerhits_expansion_error_index_search_pipeline
{
"size": 50,
"query": {
"hybrid": {
"queries": [
{
"query_string": {
"query": "storm~1",
"fields": [
"description^2.0",
"attachmentData"
]
}
},
{
"neural": {
"chunk_embedding": {
"query_text": "storm"
}
}
}
]
}
},
"_source": {
"excludes": "chunk_embedding"
},
"collapse": {
"field": "author",
"inner_hits": [
{
"size": 100,
"name": "innerHits",
"_source": {
"includes": [
"attachmentData"
]
}
}
]
}
}
Am I doing somehting wrong in terms of query structure?