I have large documents, and have increased the index value, but the problem is that highlighting has to “pick up” the entirety of my giganto-document text field (almost 10 million chars). A better fix would be to restrict hightlight fragments to the first 10K of that doc, so the operation would be quicker, and always be under the threshold for exception.
Highlighting on results that contains large amounts of text is terribly slow. Should i be putting a second field on my docs that only holds the length of content i’m after for highlighting, or is it possible for opensearch to replicate this important update that was pushed out to ES last year?
The PR puts succinctly what I am looking for:
Add a max_analyzed_offset
query parameter to allow users
to limit the highlighting of text fields to a value less than or equal to the
index.highlight.max_analyzed_offset
, thus avoiding an exception when
the length of the text field exceeds the limit. The highlighting still takes place,
but stops at the length defined by the new parameter.
From https://github.com/elastic/elasticsearch/blob/f9af60bf692c1f1bc562a69e1c0e62d9819460a8/docs/reference/search/search-your-data/highlighting.asciidoc
max_analyzed_offset
By default, the maximum number of characters analyzed for a highlight request is bounded by the value defined in the [
index.highlight.max_analyzed_offset
]setting, and when the number of characters exceeds this limit an error is returned. If this setting is set to a non-negative value, the highlighting stops at this defined maximum limit, and the rest of the text is not processed, thus not highlighted and no error is returned. The [max_analyzed_offset
] query setting does not override the [index.highlight.max_analyzed_offset
] which prevails when it’s set to lower value than the query setting.