Highlighting Speed - max_analyzed_offset at query-time

I have large documents, and have increased the index value, but the problem is that highlighting has to “pick up” the entirety of my giganto-document text field (almost 10 million chars). A better fix would be to restrict hightlight fragments to the first 10K of that doc, so the operation would be quicker, and always be under the threshold for exception.

Highlighting on results that contains large amounts of text is terribly slow. Should i be putting a second field on my docs that only holds the length of content i’m after for highlighting, or is it possible for opensearch to replicate this important update that was pushed out to ES last year?

The PR puts succinctly what I am looking for:

Add a max_analyzed_offset query parameter to allow users
to limit the highlighting of text fields to a value less than or equal to the
index.highlight.max_analyzed_offset , thus avoiding an exception when
the length of the text field exceeds the limit. The highlighting still takes place,
but stops at the length defined by the new parameter.

From https://github.com/elastic/elasticsearch/blob/f9af60bf692c1f1bc562a69e1c0e62d9819460a8/docs/reference/search/search-your-data/highlighting.asciidoc

max_analyzed_offset

By default, the maximum number of characters analyzed for a highlight request is bounded by the value defined in the [ index.highlight.max_analyzed_offset ]setting, and when the number of characters exceeds this limit an error is returned. If this setting is set to a non-negative value, the highlighting stops at this defined maximum limit, and the rest of the text is not processed, thus not highlighted and no error is returned. The [ max_analyzed_offset ] query setting does not override the [ index.highlight.max_analyzed_offset ] which prevails when it’s set to lower value than the query setting.

Hello to anyone who visits this forum, i have relegated myself to making my own workaround for the problem. I essentially add a duplicate text field called “for_highlights” which is a duplicate of my document text field to 10k chars. In this way i am able to highlight from it, and not the search field which may contain the large amount of text.

It’s been two weeks and this is the best answer i can come up with until OpenSearch implements that ES fix.