I beginner here , a lot of question , hope someone can give me some insight
The ML is based on historical data to build the model ? or the data after we click Start detector
The ML model is base on Sample anomaly history or Whole index data ?
Will the ML continue update itself or Once after first time the model is settle down
Currently i setup opendistro locally using by docker opendistro 1.11.0 + nginx + filebeat
What i am try to do
count doc number
expected when it suddenly have huge number doc coming
or the doc suddenly dropped
First i provide sample logs (normally all 200 retrun)
But seems it fail to do somehow the last two i expected it is an abnomal
I check youtube aws example Real-Time Anomaly Detection on Your Log Data Using Amazon Elasticsearch Service - YouTube
seems he turn the http status it an isolated field , however in normal there should have only 1 field
status to contains all kind of status code
I dont see the Model level can do filter range
is this means we need to do it detector level ?
Thanks for using the product and asking the questions, Vincent!
Here’s the answers to some questions.
ML will be based on historical data if possible. If not, it will use the data after it’s created.
ML model uses some recent data. A few hundreds data points.
ML is a streaming algorithm and will update to the new data.
Those should be anomalies. One possible reason the detector doesn’t raise them is that it has just raised a few anomalies earlier. The expected number of anomalies is around 0.5% of all data points by design. So if there are many anomalies in a short period of time, the earlier ones are likely to identified. It is also possible that the model has seen the data from indexed data already.
A feature needs to have a numerical value so all the features have a known fixed total dimension. Unknown number of dimensions is not supported.
Thanks very much. I’ve been search this few question and test few times
About point 3 and 4 i am still have some questions…
Is that means Opendistro is not using a fixed training model ?
for example I’ve 7 days data then perform anormal detection
After another 7 days
Which will happen here?
The model will become this 2 weeks mixed <== from your ans i assume this will happen ?
week2 become new model
The model will remain week 1 pattern
If it is mixed 2 weeks as a new model …
Based on your reply , it means if we have similar pattern before the abnormal detection will not consider it is an abnormal in future times ?
Is that the model store in another index ? So even i missing/deleted all the data i can still backup the model and restore in somewhere ?
Recently find out , after a period of not sending data/log to the index , the abnormal detection will stop forever and unable to resume it even i resume send log or save detector again , unless you create another detector with same features
No, open distro model is not fixed. It is by design a streaming algorithm that learns from live data and identify anomalies with respective current data. So in the example you gave, the model will learn data from recent weeks. The more recent a data point is, the more likely the model will remember. It uses weighted reservoir sampling if you want to know more.
It depends, if the pattern has occurred but is still rare, the model might still identify it as anomaly. If the pattern is recurring enough or last long enough, the model might learn it as normal.
Model is stored in a different index named checkpoint.
Since the detector cannot run without live data stream, it might be stopped when there is no data. But it can be restarted as you manually stop and restart a detector. You shouldn’t need to recreate a new one. I am not sure if it is caused by an implementation issue. If you can confirm the new data stream is available but the restarted detector is erroring out, that is a software issue. You can create an issue for tracking at github repo.