How do you feed the Anomaly Detection Plugin existing data?

GregT · June 3, 2020, 7:50pm

When creating a “Feature” you can preview it’s results based on existing data in a past time range, however the live detector+feature only seems to be able to start collecting new logs and isn’t aware of existing data.

I was expecting to be able to “feed” it several years worth of historical data that contains seasonal trends and then be able to visualize it with past date ranges and see where it would have detected anomalies. Am I missing something?

bpavani · June 4, 2020, 3:49pm

Hi @GregT,

This feature is for real-time streaming. We are working on historical data as well. Would you be able to share details on years worth of data? Are you looking at month over month trend?

Thanks,
Pavani

GregT · June 15, 2020, 4:01pm

Hi @bpavani,

I have what I assume is a typical use case of many applications storing their logs in elk. Our business has seasonal trends and I’d love to be able to have anomaly detection be aware of that so we could look at year over year trends as well as month over month. I have barely scratched the surface reading about how it works (RCF algorithm), but the documentation alludes to it being aware of seasonal behavior, so I hope at some point we can feed it historical data and it takes it into account.

Thanks!

ylwu · June 16, 2020, 9:14pm

hi, @GregT

Thanks for providing your use case. Currently anomaly detection mainly take real-time streaming data in and detect anomaly. So it may take some time to make the model aware of the seasonal pattern.

Try to analyze your case to make sure we understand your requirements correctly and completely. Correct me or add more use cases.
1). You already know some pattern from historical data, and want to train model with historical data first. So when we start to detect streaming data, the model already adopts the seasonal pattern and ready to be used.

2). Do you need to detect anomalous data in historical data to verify the model is good enough? Like general ML method, split historical data into training set and test set. So you can verify result and tune it accordingly.

3). Detect anomalies in historical data for analysis. Some historical data may not follow your seasonal behavior. Do you need to know these historical anomalies to analyze and tune your business strategy, process, etc?

Thanks !

amir · June 19, 2020, 8:47am

Hey everyone,

I am thinking of using ES’s Anomaly Detection for a similar use-case. I think that the features you mentioned in 2 and 3 would be very useful.

As far as I know, the current functionality of anomaly detection lets the user choose whether an event was an anomaly or not. To generalize this search across historical data is something I’d like to see and I’m sure it would greatly benefit the model accuracy.

Thanking you

ylwu · June 19, 2020, 7:36pm

hi, @amir

Thanks for your feedback.

For feature 3, have a draft idea to run anomaly detection on historical data with some cron job. I think some user may not want so real-time anomaly detection like run every 5 minutes. For example, they can run anomaly detection on last week’s data and put the anomaly detection task running at night or other cluster idle time. We can create a weekly cron job and let user specify the run time. The job will replay last week’s data points and find out anomalies. User can review the anomaly job progress and results on Kibana. One benefit I can see is user can choose to run detector at some system idle time, either run once or periodically, and will clear model once job done rather than hold model in memory all the time.

How do you think about this idea? Welcome any comments/new ideas. If you have other use cases, feel free to post here.

Thanks!

amir · June 24, 2020, 7:34am

The project I’m currently working on has a use case for both scenarios (real-time anomaly detection and wider anomaly detection analysis done in off-hours). I think it would be beneficial to implement more ML, alongside anomaly detection for the off-hours analysis. Here is how I planned to use this feature:

Real time use-case:
I was hoping to use real time anomaly detection to give a heads-up to the operations team about possible indications that an outage of the system would occur so they could take appropriate actions to mitigate the risk before it actually happens.

Cluster idle time use-case 1:
A lot of business logic data is captured through app logs that my system ingests. I think it would be great if I could use the existing data to build a regression model which could predict certain business parameters - such as the number of expected transactions. If we regard higher-than-expected number of transactions as anomalies, I think it would be beneficial to capture what other parameters contributed/hinted to the appearance of those anomalies. With that knowledge, I’d like to provide my clients with an information (based on param1, param2, param3 values, we expect the number of transactions to be X in the specific time period).

I hope you find this information useful

Thank you for taking the time to go through all of this. I’m a great fan of the work OpenDistro team is doing and I’m excitedly waiting to hear what new features you will come up with next

ylwu · June 24, 2020, 11:57pm

hi, @amir

Thanks for sharing your use cases and really good suggestions. We will discuss your use cases and update our ODFE roadmap if we plan to put resource on it. Don’t hesitate to tell us if you have new use cases, find any bugs, or have suggestions. And welcome any contribution on Github, you know ODFE is completely open source.

Thanks

ganesh · August 27, 2020, 11:17am

May I get the rule set used for anomaly detection , which will be used for system / network Domain. I need a use case for ssh login fails for a system

ylwu · August 27, 2020, 4:47pm

hi, @ganesh, can you explain more about your question? You want to ssh login ES node? Seems your question not related with this thread. How about creating a new topic?

Topic		Replies	Views
Differences between real-time and historical anomaly detection Machine Learning	1	766	March 4, 2021
Does anomaly detection have time range? OpenSearch Dashboards anomaly-detection	5	362	June 22, 2023
Further models for anomaly detection? Machine Learning	4	702	July 1, 2023
Community blogs about using Anomaly detection Machine Learning	5	1029	August 24, 2020
Anomaly Detection Data Retention Machine Learning discuss , anomaly-detection	2	32	November 24, 2025

How do you feed the Anomaly Detection Plugin existing data?

Related topics