Create a DSL query to return content with special character

Versions : OpenSearch 2.9.0/Dashboard 2.9.0/Ubuntu(Docker)/Chrome and Safari

Describe the issue:
Actually, I’m trying to create an alert that will be triggered by data from message field, but I faced with an issue that I cant filtered needed documents from index via query:
in my case I want to return values like:
[Classification: Attempted Information Leak] [Priority: 2] {icmp}>
and also I want only see the situations when such happened only inside our internal network not outside, e.g. :
[Classification: Attempted Information Leak] [Priority: 2] {icmp}>

So in my head, I planned to filter values by query like:

"match_phrase": {
                "message": {
                "query": "->10.*"

This is only part of the message that points to that action happened inside the internal network, BUT that DSL or Lucene syntax doesn’t allow me to create query with character “>”
I tried the options:

message: "\\>10.*"

and regexp option:

"query": {
    "regexp": {
      "message": {
        "value": ">10.*",

but nothing works.

Does anybody know if is it possible to filter exactly the characters I want?

It seems like you have a text analysis problem: at index time, characters such as - are likely dropped (at least they are by default) because they are considered separators between tokens.

A query in the OpenSearch DSL would translate into a Lucene query based on its type. A match_phrase query will analyze your query string (by default with the same analyzer as your string) and produce a phrase query out of it. So it too, by default, would drop things like -.

A regexp query is a term query, in the sense that it doesn’t analyze your query string. This can be a problem if, for example, your index-time analyzer drops the -and you search for something that contains a -, because Lucene will not find such term.

To filter the characters you want, I see two options:

  1. Change analysis. By the looks of it, you’ll need a custom analyzer built around the pattern tokenizer.
  2. Parse the data. If you had your “” in a separate field, that field can be a keyword and you can filter on that. It would be faster and more precise. You could do this parsing in an ingest pipeline or in your ETL outside OpenSearch.
1 Like