Issue with message field data


I do have a question about this error I do have :

2023-03-20T15:16:32.387-04:00 ERROR [PivotAggregationSearch] Aggregation search query returned an error: OpenSearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [message] in order to load field data by uninverting the inverted index. Note that this can use significant memory.].

I recently change ElasticSearch to Opensearch 2.6, i’ve tried to change into the API the fielddata to put yes. Nothing change, I still have the error.

How did you set fielddata to true for the field message? I’ve tried updating the mapping and it works:

PUT test1/_mapping
  "properties": {

Let me also expand on:

Note that this can use significant memory

Fielddata will un-invert your inverted index for that field and store the result in memory, as a mapping between document IDs and terms. So if you have a lot of data (e.g. logs or other use-cases with millions to billions of documents) you might bump into some circuit breaker (fielddata or parent) or cause instability in the cluster.

Even if you don’t have a lot of data, this structure has to be built from scratch after every refresh on the first query, so that query will have quite a delay. It’s also going to build it for all the data in that field, even if that first query only returns two documents.

The usual practice is to use doc_values for aggregations. But this doesn’t work for text fields, it works for keyword fields instead. But this means you don’t get analysis (i.e. breaking the text into tokens). For that, some people break the text into tokens in the pipeline, before indexing data into Elasticsearch, then index it as an array of “keywords”. I know it’s not apples-to-apples, but I’m venting some ideas :slight_smile: