Highlighting Speed - max_analyzed_offset at query-time

patrick · March 3, 2022, 6:18pm

I have large documents, and have increased the index value, but the problem is that highlighting has to “pick up” the entirety of my giganto-document text field (almost 10 million chars). A better fix would be to restrict hightlight fragments to the first 10K of that doc, so the operation would be quicker, and always be under the threshold for exception.

Highlighting on results that contains large amounts of text is terribly slow. Should i be putting a second field on my docs that only holds the length of content i’m after for highlighting, or is it possible for opensearch to replicate this important update that was pushed out to ES last year?

The PR puts succinctly what I am looking for:

Add a max_analyzed_offset query parameter to allow users
to limit the highlighting of text fields to a value less than or equal to the
index.highlight.max_analyzed_offset , thus avoiding an exception when
the length of the text field exceeds the limit. The highlighting still takes place,
but stops at the length defined by the new parameter.

github.com/elastic/elasticsearch

Add query param to limit highlighting to specified length

elastic:master ← matriv:limitHighlighting

opened 10:05AM - 12 Jan 21 UTC

matriv

+564 -214

Add a `max_analyzed_offset` query parameter to allow users to limit the highlig…hting of text fields to a value less than or equal to the `index.highlight.max_analyzed_offset`, thus avoiding an exception when the length of the text field exceeds the limit. The highlighting still takes place, but stops at the length defined by the new parameter. Closes: #52155

From https://github.com/elastic/elasticsearch/blob/f9af60bf692c1f1bc562a69e1c0e62d9819460a8/docs/reference/search/search-your-data/highlighting.asciidoc

max_analyzed_offset

By default, the maximum number of characters analyzed for a highlight request is bounded by the value defined in the [ index.highlight.max_analyzed_offset ]setting, and when the number of characters exceeds this limit an error is returned. If this setting is set to a non-negative value, the highlighting stops at this defined maximum limit, and the rest of the text is not processed, thus not highlighted and no error is returned. The [ max_analyzed_offset ] query setting does not override the [ index.highlight.max_analyzed_offset ] which prevails when it’s set to lower value than the query setting.

patrick · March 19, 2022, 2:01pm

Hello to anyone who visits this forum, i have relegated myself to making my own workaround for the problem. I essentially add a duplicate text field called “for_highlights” which is a duplicate of my document text field to 10k chars. In this way i am able to highlight from it, and not the search field which may contain the large amount of text.

It’s been two weeks and this is the best answer i can come up with until OpenSearch implements that ES fix.

Topic		Replies	Views
Set max_analyzer_offset or disable highlighting OpenSearch Dashboards	1	225	January 29, 2025
Clarification about max_analyzer_offset setting OpenSearch discuss	1	653	July 5, 2023
Avoid _analyze has exceeded the allowed maximum of [10000] by using chunking pipeline? OpenSearch troubleshoot	5	122	May 14, 2025
Alternative for Java highlightBuilder.maxAnalyzedOffset OpenSearch Client Libraries opensearch-java	1	327	February 7, 2023
Can't figure out doc it OpenSearch	0	421	September 1, 2022

Highlighting Speed - max_analyzed_offset at query-time

Related topics