Count terms in features of anomaly detectors

Hi guys

I have a use case to detect anomalies in log files of denied remote access. I would create a detector with filter to the according log message and add a feature with “count” aggregation to the client_ip field.

According to the youtube video it is not possible to count terms like ip addresses in features, only numeric values. Ist that still true? I think using the “count” aggregation, which is mapped to the Elasticsearch “value_count” aggregation, it should be possible to count such non-numeric fields.

Thanks for clarification.

Kind regards
Elmar

You can user any field types which supported by ES count with expression
Screen Shot 2020-11-18 at 9.56.10 AM

Thanks für the answer.
Is it also possible to use a terms aggregation?

"status_codes": {
  "terms": {
    "field" : "status_code"
  }
}

Is the anomaly detection engine able to handle a buckets array with values? See result:

"status_codes": {
  "doc_count_error_upper_bound" : 0,
  "sum_other_doc_count" : 53,
  "buckets" : [
    {
      "key" : 200,
      "doc_count" : 4583
    },
    {
      "key" : 301,
      "doc_count" : 4501
    },
    [...]
}

Currently the feature query only support single value aggregation. That means the aggregation should only return 1 numeric value, e.g. max/min/sum/average/count. You can’t use term aggregation and a bucket array.

Thank’s for the clarification!

Hi @Elmux,

Have you used high cardinality feature? I believe that should address your requirement. Let me know if it doesn’t.

Thanks,
Pavani

What about adding a category breakdown on top of the count() function?

I’m trying to get anomalies in the document counts for security logs for each category (INDEX, login failed, BAD SSL, etc)

Thanks

You can use our high cardinality detector by specifying error category as category field.