Hi there ,
I beginner here , a lot of question , hope someone can give me some insight
The ML is based on historical data to build the model ? or the data after we click Start detector
The ML model is base on Sample anomaly history or Whole index data ?
Will the ML continue update itself or Once after first time the model is settle down
Currently i setup opendistro locally using by docker opendistro 1.11.0 + nginx + filebeat
What i am try to do
count doc number
expected when it suddenly have huge number doc coming
or the doc suddenly dropped
First i provide sample logs (normally all 200 retrun)
But seems it fail to do somehow the last two i expected it is an abnomal
I check youtube aws example https://youtu.be/V1MRY5X-Anw?t=2093
seems he turn the http status it an isolated field , however in normal there should have only 1 field
status to contains all kind of status code
I dont see the Model level can do filter range
is this means we need to do it detector level ?
Thanks for using the product and asking the questions, Vincent!
Here’s the answers to some questions.
- ML will be based on historical data if possible. If not, it will use the data after it’s created.
- ML model uses some recent data. A few hundreds data points.
- ML is a streaming algorithm and will update to the new data.
- Those should be anomalies. One possible reason the detector doesn’t raise them is that it has just raised a few anomalies earlier. The expected number of anomalies is around 0.5% of all data points by design. So if there are many anomalies in a short period of time, the earlier ones are likely to identified. It is also possible that the model has seen the data from indexed data already.
- A feature needs to have a numerical value so all the features have a known fixed total dimension. Unknown number of dimensions is not supported.
Thanks very much. I’ve been search this few question and test few times
About point 3 and 4 i am still have some questions…
- Is that means Opendistro is not using a fixed training model ?
for example I’ve 7 days data then perform anormal detection
After another 7 days
Which will happen here?
If it is mixed 2 weeks as a new model …
Based on your reply , it means if we have similar pattern before the abnormal detection will not consider it is an abnormal in future times ?
Is that the model store in another index ? So even i missing/deleted all the data i can still backup the model and restore in somewhere ?
Recently find out , after a period of not sending data/log to the index , the abnormal detection will stop forever and unable to resume it even i resume send log or save detector again , unless you create another detector with same features
Thx for those reply , thats already help me a lot and save me a lot of time
Here my steps
- I start the test and inject traffic at 11/23 11:00 to 12:00
- I create the Abnormal Detector (with index filebeat* )
- At 12:00 i stopped inject traffic and leave the setup there
- After some time later Abnormal Detector shows it Data is not being ingested correctly
- 11/24 09:00 i resume the traffic and Abnormal Detector does not resume , keep shows it Data is not being ingested correctly
- 11/24 10:09 i stop and start again the Abnormal Detector , it keep Shows Initializing and I can confirm the data are inject into over 30mins
Could you share your feature configuration?