Issue with message field data

Xuupu · March 20, 2023, 7:17pm

Hello!

I do have a question about this error I do have :

2023-03-20T15:16:32.387-04:00 ERROR [PivotAggregationSearch] Aggregation search query returned an error: OpenSearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [message] in order to load field data by uninverting the inverted index. Note that this can use significant memory.].

I recently change ElasticSearch to Opensearch 2.6, i’ve tried to change into the API the fielddata to put yes. Nothing change, I still have the error.

gaobinlong · March 28, 2023, 8:48am

How did you set fielddata to true for the field message? I’ve tried updating the mapping and it works:

PUT test1/_mapping
{
  "properties": {
      "message":{
        "type":"text",
        "fielddata":true
      }
    }
}

radu.gheorghe · March 30, 2023, 7:58am

Let me also expand on:

Note that this can use significant memory

Fielddata will un-invert your inverted index for that field and store the result in memory, as a mapping between document IDs and terms. So if you have a lot of data (e.g. logs or other use-cases with millions to billions of documents) you might bump into some circuit breaker (fielddata or parent) or cause instability in the cluster.

Even if you don’t have a lot of data, this structure has to be built from scratch after every refresh on the first query, so that query will have quite a delay. It’s also going to build it for all the data in that field, even if that first query only returns two documents.

The usual practice is to use doc_values for aggregations. But this doesn’t work for text fields, it works for keyword fields instead. But this means you don’t get analysis (i.e. breaking the text into tokens). For that, some people break the text into tokens in the pipeline, before indexing data into Elasticsearch, then index it as an array of “keywords”. I know it’s not apples-to-apples, but I’m venting some ideas

Topic		Replies	Views
Query failing for large volume General Feedback	2	224	May 23, 2024
Get the data from rollup index (aggregated data) OpenSearch discuss , troubleshoot	1	759	July 11, 2024
Keyword field unable to be used for aggregation OpenSearch troubleshoot	8	10637	October 10, 2023
Finding all distinct values of text field and count OpenSearch Dashboards discuss , troubleshoot , feature-request	3	7831	August 17, 2023
OpenSearch Bucket Aggregation - Get full message text OpenSearch	3	747	July 13, 2023

Issue with message field data

Related topics