Finding all distinct values of text field and count

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch 2.5, Chrome 115.0.5790.170

Describe the issue:
I wish to search for all existing values and count for a text field that is not marked as a keyword or an aggregate. I see many solutions saying I could update my index to mark these text fields as keyword and as aggretable but doing so will greatly stress my cluster.

Whenever I preview the text field in Opensearch Discover, it shows a sample top 5 aggregated values as shown. Is it possible to expand upon that?

@rhino I don’t think the Discovery display is changeable.

Could you share the mapping of the highlighted field?

If you store the field in _source, you should be able to do a terms aggregation on a script: Terms aggregation | Elasticsearch Guide [7.10] | Elastic

In your case, the script would be something like ctx._source.FIELD_NAME

This would make Elasticsearch go ahead and unpack _source for each document, so it will be painfully slow, but if you don’t have a ton of data, it might just work.

Thanks all for replying. So I do have a ton of data so term aggregation may stress the cluster a ton. the current mapping of the fields and I am trying to make the text fields “action” and “taker” as aggretable :

{
  "index_patterns": [
    "kubernetes-*"
  ],
  "template": {
    "settings": {
      "index.number_of_shards": "32",
      "index.number_of_replicas": "1",
      "index.mapping.ignore_malformed": "true"
    },
    "mappings": {
      "dynamic": false,
      "properties": {
        "dd": {
          "dynamic": true,
          "type": "object"
        },
        "kubernetes": {
          "properties": {
            "container_name": {
              "type": "keyword"
            },
            "container_hash": {
              "type": "keyword"
            },
            "host": {
              "type": "keyword"
            },
            "docker_id": {
              "type": "keyword"
            },
            "pod_id": {
              "type": "keyword"
            },
            "container_image": {
              "type": "keyword"
            },
            "labels": {
              "dynamic": true,
              "type": "object"
            },
            "namespace_name": {
              "type": "keyword"
            },
            "pod_name": {
              "type": "keyword"
            }
          }
        },
        "exc_info": {
          "type": "text"
        },
        "log": {
          "type": "text"
        },
        "dns": {
          "dynamic": true,
          "type": "object"
        },
        "collection": {
          "type": "keyword"
        },
        "message": {
          "type": "text"
        },
        "error": {
          "type": "text"
        },
        "collection_id": {
          "type": "integer"
        },
        "app_context": {
          "dynamic": true,
          "type": "object"
        },
        "application_name": {
          "type": "keyword"
        },
        "filename": {
          "type": "keyword"
        },
        "lineno": {
          "type": "keyword"
        },
        "relayId": {
          "type": "keyword"
        },
        "stream": {
          "type": "keyword"
        },
        "revert_reason": {
          "type": "keyword"
        },
        "name": {
          "type": "keyword"
        },
        "action": {
          "type": "keyword"
        },
        "payment": {
          "dynamic": true,
          "type": "object"
        },
        "taker": {
          "type": "keyword"
        },
        "levelname": {
          "type": "keyword"
        },
        "queue": {
          "type": "keyword"
        },
        "event_handler": {
          "type": "keyword"
        }
      }
    }
  },
  "composed_of": [],
  "priority": 1,
  "data_stream": {
    "timestamp_field": {
      "name": "@timestamp"
    }
  },
  "name": "kubernetes-logs"
}
1 Like