It came to my attention that the detector was triggering many false positives, and it happened after I changed the index’s text fields to keyword. Upon investigation, found out that the rules were replacing whitespaces with “_ws_” escape sequence. For this I created two indexes both with just one attribute. In one index the datatype is keyword and the other is text. A test rule was also created.
Here’s an example of the detection logic in the rule:
detection:
condition: Selection_1
Selection_1:
companyName|all:
- microsoft corp
Here’s the security analytics generated detection query:
"query": "companyName: \"microsoft_ws_corp\""
As keyword field is not analyzed, my understanding is that the keyword detector wasn’t triggered because “_ws_” isn’t present in the ingested document.
{
"log.attributes.companyName": "microsoft corp"
}
But my text detector worked, think it’s because text fields have analyzers.
to. test my theory about whitespaces, I ingested the following document to the keyword index and a finding was generated.
{
"log.attributes.companyName": "microsoft_ws_corp"
}
I also queried the exact query string, but no documents were returned fro both the indexes, even the document present in finding wasn’t returned. Maybe the way detectors query the indices are different from what I thought. Anyways that’s a topic for another day.
Shouldn’t opensearch handle the difference between text and keyword in security analytics? I thought the escape sequences are kept in place so that it’s handled in different field types. I also found this exact issue raised in github back in May 2024: SIGMA rule translation -> lucene query replaces spaces " " with "_ws_" which lucene doesnt understand. · Issue #1024 · opensearch-project/security-analytics · GitHub . Someone tried fixing the issue, but they ultimately gave up.
What can be done for the detector to work properly for keyword fields? I know reverting back to text field is an option, but do I have any other options? I explored the usage of custom analyzers, but my application does alot of querying in the indexes, so I fear all that will be affected. Any solution to this? Why was the space replaced with “_ws_” which ultimately made the detector to fail for keyword fields?