Versions:
- OpenSearch 2.19.0
Describe the issue:
We’re seeing several GB of _id fielddata loaded on data nodes in our production cluster. _cat/fielddata?v&s=size:desc shows _id as the top fielddata consumer.
We also see circuit breaker errors:
[FIELDDATA] Data too large, data for [_id] would be [X bytes], which is larger than the limit of [Y bytes]
Something is forcing _id to load into heap via fielddata. However, we can’t find the source — our slow logs show no queries that sort or aggregate on _id.
We also run many Painless scripts inside aggregations and are unsure if any of those could implicitly trigger _id fielddata loading.
How can we identify which queries are loading _id fielddata?
We can’t enable slow log at DEBUG/TRACE level — the query volume would overwhelm the cluster. Is there another approach? For example:
-
Does temporarily lowering
indices.breaker.fielddata.limitto a very small value cause the exception stack to include the query source? -
Can
_nodes/hot_threadsduring high fielddata periods help trace it? -
Are there any known internal operations that sort on
_id? -
Can Painless scripts using param.source implicitly trigger
_idfielddata?
Any guidance appreciated.