Failed to index texts with lengths above 32766

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
“version” : {
“distribution” : “opensearch”,
“number” : “3.1.0”,
“build_type” : “tar”,
“build_hash” : “8ff7c6ee924a49f0f59f80a6e1c73073c8904214”,
“build_date” : “2025-06-21T08:05:43.345081313Z”,
“build_snapshot” : false,
“lucene_version” : “10.2.1”,
“minimum_wire_compatibility_version” : “2.19.0”,
“minimum_index_compatibility_version” : “2.0.0”

Describe the issue:

Failed to index any text longer than 32766. The error is like:

ERROR:main:Error 1: {‘index’: {‘_index’: ‘table1’, ‘_id’: ‘123456789’, ‘status’: 400, ‘error’: {‘type’: ‘not_x_content_exception’, ‘reason’: ‘Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes’}}}

Any idea how to solve it? I would like to avoid truncating text if possible.

hey @summerist.l ,

Have you a sample of what you’re trying to send and from what is sending it?

Leeroy.

@Leeroy Thank you for the reply! I am not able to share the original text but it is just user reviews in plain text, no emoji no special characters, totally UTF8 compatible. I have done a bunch of tests so it is only related to text length.

Solved. The reason of the error is that I defined the text field as ‘keyword‘ in index schema like this:

        "extracted_text": {

            "type": "text",

            "analyzer": "custom_text_analyzer",

            "fields": {

                "keyword": {"type": "keyword"}

            }

        },

and Lucene has a maximum term length limit of 32766 on ‘keyword’ type. The error went away once you removed the ‘fields‘ key pair in schema. It won’t affect hybrid search at all.

1 Like