Muti-variate Anomaly Detection

Hi all,

My team and I are looking into OpenSearch anomaly detection and have some questions we hoped the community could help with.

We have event data with a structure similar to:

{"timestamp": "2022-09-20T15:00:00.000Z", "eventType": "click", "component": "some.api.name"}

We had intended on defining eventType and component as OpenSearch anomaly detection features but realized that the values contained for each field are not considered.

Am I right in saying that we would need to define a detector and model for each of the following:

{"timestamp": "...", "eventType": "click", "component": "component.A"}
{"timestamp": "...", "eventType": "search", "component": "component.A"}
{"timestamp": "...", "eventType": "click", "component": "component.B"}
{"timestamp": "...", "eventType": "search", "component": "component.B"}

Are there other approaches that are more scalable that aggregate on distinct values for each feature?

Thanks for the help!

(admin) Moved to machine-learning sub-category.

@ylwu - could you or the team help on this?

@Kenton Thanks for your question. Not sure if I understand this correctly “defining eventType and component as OpenSearch anomaly detection features”. As the “eventType” and “component” are not numeric type, so you want to use count of these values as features?

2 Likes

Hi,

Did you find any solution? I’m looking same kind of solution :slight_smile:

You can specify eventType and component as categorical fields. We will aggregate on distinct values of eventType and component and create separate models.

1 Like