Hybrid search on nested fields

adrianahariuc · March 31, 2025, 1:29pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.19.1

Describe the issue:
I followed the available documentation and created a knn index in which every record has a semantic field (of type text) that is properly processed during ingestion so that its content is chunked and used to generate vector embeddings. The result is that each record has a multi valued field named “nested_chunks_embeddings” that contains nested elements structured this way:

   {
        "text" : "textual content of the chunk",
        "embedding" : "embedding generated from the textual content of the chunk"
   }

At this point I want to do a hybrid search on the index in order to find the CHUNKS that best respond to the query text given as input. I can’t have chunks as separate records. What I want to obtain is a list of records that satisfy the query and for each record a list of the chunks that actually matched.

At this point I used a nested query on the mentioned nested field and did a hybrid query:

semantic on “nested_chunks_embeddings.embedding”
lexical on “nested_chunks_embeddings.text”

By adding the inner_hits option I get the chunks that matched but it seems that the normalization processor does not work. The score for each chunk seems to be calculated by summing up the semantic and the lexical scores without properly normalizing the lexical one and without giving each of the scores the weight indicated in the normalization processor. This is obviously problematic.

What i basically need is to find the chunks that best satisfy the search criteria both semantically and lexically. Is there any other way to obtain this?

Configuration:
example of index:

PUT testindex
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "semantic_field_to_use_for_chunking": {
        "type": "text"
      },
      "nested_chunks_embeddings": {
        "type": "nested",
        "properties": {
          "text": {
            "type": "text"
          },
          "embedding": {
            "dimension": 768,
            "type": "knn_vector"
          }
        }
      }
    }
  }
}

Query that I used:

GET /testindex/_search?search_pipeline=hybrid_search
{
  "query": {
    "nested": {
      "score_mode": "max",
      "path": "nested_chunks_embeddings",
      "inner_hits":{"from":0},
      "query": {
        "hybrid": {
          "queries": [
            {
              "neural": {
                "nested_chunks_embeddings.embedding": {
                  "query_text": "pipeline configuration in opensearch",
                  "model_id": "PZCY0pUB9e1VVreM-Wei",
                  "expand_nested_docs": true,
                  "filter": {
                    "match":{
                      "nested_chunks_embeddings.field":"passage_text"
                    }
                  }
                }
              }
            },
            {
              "query_string": {
                "query": "pipeline AND configuration AND opensearch",
                "fields": [
                  "nested_chunks_embeddings.chunk"
                ]
              }
            }
          ]
        }
      }
    }
  }
}

The search pipeline is standard:

PUT _search/pipeline/hybrid_search
{
		"description": "processor for hybrid search",
		"phase_results_processors": [
			{
				"normalization-processor": {
					"normalization": {
						"technique": "min_max"
					},
					"combination": {
						"technique": "arithmetic_mean",
						"parameters": {
							"weights": [
								0.5,
								0.5
							]
						}
					}
				}
			}
		]
}

Another approach I tries is using a hybrid query as a wrapper and then making both the lexical and the semantic queries nested. This way scores seem fine but I lose the inner_hits option, which is essential. The mentioned query:

GET /testindex/_search?search_pipeline=hybrid_search
{
  "query": {
    "hybrid":{
      "queries":[
        {
          "nested":{
            "score_mode": "max",
            "path": "nested_chunks_embeddings",
            "inner_hits":{},
            "query":{
              "neural": {
                "nested_chunks_embeddings.embedding": {
                  "query_text": "pipeline configuration in opensearch",
                  "model_id": "PZCY0pUB9e1VVreM-Wei",
                  "expand_nested_docs": true,
                  "filter": {
                    "match":{
                      "nested_chunks_embeddings.field":"passage_text"
                    }
                  }
                }
              }
            }
          }
        },{
          "nested":{
             "score_mode": "max",
              "path": "nested_chunks_embeddings",
              "inner_hits":{},
              "query":{
                "query_string": {
                  "query": "pipeline AND configuration AND opensearch",
                  "fields": [
                    "nested_chunks_embeddings.chunk"
                  ]
                }
              }
          }
        }
      ]
    }
  }
}

Relevant Logs or Screenshots:

pablo · April 28, 2025, 1:07pm

@adrianahariuc Could you take a look at this GitHub feature request?

github.com/opensearch-project/neural-search

[FEATURE] Hybrid request does not return inner_hits for nested objects.

opened 10:16AM - 30 Apr 24 UTC

closed 11:25PM - 07 Apr 25 UTC

Kovsonq

v3.0.0 enhancement hybrid search

### Is your feature request related to a problem? Yes, I'm experiencing a probl…em when I use the hybrid search plugin in OpenSearch v2.11.0. Specifically, when I include the "inner_hits" parameter in my query for nested objects, I do not receive any inner hits in the response. This is causing frustration as my system requires this level of detail for optimal operation. ### What solution would you like? I would like the hybrid search plugin to be updated to include the functionality to correctly return inner hits from nested queries. Ideally, this would function seamlessly as it does in standard OpenSearch queries. This improvement would allow me and other users to fully utilize the power of the hybrid search plugin.

It is included in the roadmap for OpenSearch v3.0.0.

Topic		Replies	Views
Hybrid Search Normalization for Nested Queries OpenSearch troubleshoot , configure	3	96	March 10, 2025
Normalization Preprocessor does not work with Nested, Hybrid queries OpenSearch discuss , troubleshoot	0	30	March 10, 2025
[BUG] Insufficient number of hits for nested knn queries with efficient filter #2347 k-NN	3	81	February 18, 2025
Elasticsearch Hybrid Query - No Results k-NN	8	3987	March 2, 2021
Neural search not working with nested vector field mappings OpenSearch releases , discuss , troubleshoot , configure	0	166	September 6, 2024

Hybrid search on nested fields

Related topics