Hybrid query highlight lexical matches

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): Opensearch version 3.2.0

Describe the issue: Lexical highlight not returned for hybrid query
I have a knn index and I want to use a hybrid query while still keeping track of the lexical matches but Opensearch seems to ignore the highlight parameter.

Here is an example of the query structure:

GET my_index/_search?search_pipeline=hybrid_search_pipeline
{
  "_source": {
    "excludes": [
      "chunk_embedding"
    ]
  },
  "collapse": {
    "field": "id"
  },
  "highlight": {
    "fragment_size": 100,
    "post_tags": [
      "</span>"
    ],
    "pre_tags": [
      """<span solrfound="true" style="background:#fdfdb0;">"""
    ],
    "fields": {
      "description": {},
      "altDescription": {}
    }
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "query_string": {
            "fields": [
              "description^2.0",
              "altDescription^1.5",
              "attachmentData"
            ],
            "query": "rossini~1"
          }
        },
        {
          "neural": {
            "chunk_embedding": {
              "query_text": "rossini"
            }
          }
        }
      ]
    }
  },
  "size": 50
}

The search does not fail and the results are returned but there is no “highlight” element.
Searching similar topics I found some discussing semantic highlighting but that is NOT my need for now. I only need to obtain the lexical matches with highlight.
I also tried adding the “highlight_query” parameter but it did not solve the issue.

Configuration: Knn index. Both fields I am trying to use highlighting on have the mapping:

{
      "store": true,
      "term_vector": "yes",
      "type": "text"
    }

Can anyone help me?

@adrianahariuc Thank you for the question, The highlighting There is already a feature request raised for this here. I would recommend adding a comment with your configuration there.

@adrianahariuc could you share the step to reproduce the issue in [FEATURE] Highlight feature in hybrid query · Issue #1215 · opensearch-project/neural-search · GitHub?

Thank you for the reference. I commented there sharing the steps to follow in order to replicate the issue.

I am also posting them here in order to keep the discussion updated:

#INDEX SCHEMA
PUT /hybrid_highlight_index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "attachmentData": {
        "store": true,
        "term_vector": "yes",
        "type": "text"
      },
      "chunk_embedding": {
        "dimension": 768,
        "type": "knn_vector"
      },
      "description": {
        "store": true,
        "term_vector": "yes",
        "type": "text"
      }
    }
  }
}

#INGEST PIPELINE
PUT _ingest/pipeline/hybrid_highlight_index_ingest_pipeline
{
  "description": "Pipeline di ingestion per core: hybrid_highlight_index",
  "processors": [
    {
      "text_embedding": {
        "field_map": {
          "attachmentData": "chunk_embedding"
        },
        "model_id": "py4XKpoBJXee6OmaWDRY"
      }
    }
  ]
}

#BULK INGESTION
POST /hybrid_highlight_index/_bulk?pipeline=hybrid_highlight_index_ingest_pipeline
{ "index": {} }
{ "attachmentData": """Wuthering Heights, Emily Brontë's 1847 novel, is a dark, passionate tale set on the bleak Yorkshire moors, exploring obsessive love, revenge, and social class through the destructive relationship of Catherine Earnshaw and Heathcliff, framed by a narrative where outsider Mr. Lockwood hears the tragic story from housekeeper Nelly Dean, revealing a world of fierce emotions and supernatural undertones.""", "description":"""Wuthering Heights - Emily Brontë""" }
{ "index": {} }
{ "attachmentData": """The Magic Mountain (1924) by Thomas Mann is a monumental novel about young German engineer Hans Castorp, who visits his cousin at a tuberculosis sanatorium in the Swiss Alps, intending a short stay but getting drawn into the isolated, timeless world of illness, philosophy, and pre-WWI European culture for seven years, exploring life, death, love (with Clavdia Cauchat), and politics before being pulled back to the "flatland" and the outbreak of war. It's a philosophical bildungsroman (coming-of-age story) using the microcosm of the Berghof sanatorium to reflect the macrocosm of a world on the brink of chaos, contrasting health and sickness, spirit and flesh, and intellect versus instinct. """, "description":"""The Magic Mountain - Thomas Mann""" }
{ "index": {} }
{ "attachmentData": """The Unbearable Lightness of Being's introduction sets up the novel's core philosophical dilemma: the conflict between "lightness" (meaninglessness, freedom from consequence) and "weight" (purpose, responsibility, eternal return), using the backdrop of Prague during the 1968 Soviet invasion to explore these ideas through the interwoven lives of surgeon Tomas, his wife Tereza, his mistress Sabina, and her lover Franz, blending love, politics, and existential questions. It immediately contrasts Nietzsche's eternal return (heavy) with Parmenides' concept of single-occurrence life (light), suggesting life's fleeting moments make choices weightless, a tension central to the characters' struggles with love, fidelity, and freedom. """, "description":"""The Unbearable Lightness of Being - Milan Kundera""" }

#CHECK RECORDS
GET hybrid_highlight_index/_search
{
  "query": {
    "match_all": {}
  }
}

#SEARCH PIPELINE
PUT /_search/pipeline/hybrid_highlight_index_search_pipeline
{
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              0.5,
              0.5
            ]
          }
        }
      }
    }
  ],
  "request_processors": [
    {
      "neural_query_enricher": {
        "default_model_id": "py4XKpoBJXee6OmaWDRY"
      }
    }
  ]
}

#SEARCH WITH HIGHLIGHT (highlight not returned)
GET hybrid_highlight_index/_search?search_pipeline=hybrid_highlight_index_search_pipeline
{
  "highlight": {
    "pre_tags": [
      "<strong>"
    ],
    "post_tags": [
      "</strong>"
    ],
    "fields": {
      "attachmentData": {}
    }
  },
  "size": 50,
  "query": {
    "hybrid": {
      "queries": [
        {
          "query_string": {
            "query": "Swiss~1 Alps~1",
            "fields": [
              "description^2.0",
              "attachmentData"
            ]
          }
        },
        {
          "neural": {
            "chunk_embedding": {
              "query_text": "Swiss Alps"
            }
          }
        }
      ]
    }
  },
  "_source": {
    "excludes": "chunk_embedding"
  }
}

#SEARCH WITH HIGHLIGHT WITH HIGHLIGHT QUERY (highlight not returned)
GET hybrid_highlight_index/_search?search_pipeline=hybrid_highlight_index_search_pipeline
{
  "highlight": {
    "pre_tags": [
      "<strong>"
    ],
    "post_tags": [
      "</strong>"
    ],
    "fields": {
      "attachmentData": {
        "highlight_query": {
          "query_string": {
            "query": "Swiss~1 Alps~1",
            "fields": [
              "attachmentData"
            ]
          }
        }
      }
    }
  },
  "size": 50,
  "query": {
    "hybrid": {
      "queries": [
        {
          "query_string": {
            "query": "Swiss~1 Alps~1",
            "fields": [
              "description^2.0",
              "attachmentData"
            ]
          }
        },
        {
          "neural": {
            "chunk_embedding": {
              "query_text": "Swiss Alps"
            }
          }
        }
      ]
    }
  },
  "_source": {
    "excludes": "chunk_embedding"
  }
}

@adrianahariuc @heemin I did some further testing and it seems this is not specific to hybrid query, see below findings:

  1. Highlighting works on text fields without embeddings
  2. Highlighting works on KNN indexes without the text_embedding processor
  3. The text_embedding processor destroys the ability to highlight, even when applied after initial indexing.
  4. Simple match queries don’t return highlights after using text_embedding processor.
  5. query_string queries don’t return highlights after using text_embedding processor
  6. The same queries work perfectly on fields WITHOUT embeddings

I updated the issue with these findings also, but I raised a new issue for this as the current one is specific to hybrid search.

I tried upgrading to the last release and it seems to work as expected. I am going to check out some more test cases but upgrading should fix the issue.

New versions:

  • Opensearch 3.3.2
  • Openseach Dashboards 3.3.0