Null values matched in query_string with asterisks

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): 2.19.4

Describe the issue:

I used the index template below to create an index, along with a single document for testing. The document is indexed such that only the digits are kept; for example, indexing a value like “aaa123” would only retain the “123” portion.

Likewise, if a user searches for something like “zzz999”, the only token generated should be “999”.

This works for the most part. But if the user enters a query_string that contains an asterisk, then the document is returned regardless of whether it actually matches. For example, searching for:

*asdf*

produces a match with the document below, which has a value of 12345. This is in contrast to the two _analyze functions listed below, which produce tokens that don’t match each other.

I would be really grateful for any help that anyone could provide. Thanks in advance.

Configuration:

PUT ds1
{
  "settings": {
    "analysis": {
      "char_filter": {
        "strip_nondigits": {
          "type": "pattern_replace",
          "pattern": "\\D",
          "replacement": ""
        }
      },
      "filter": {
        "remove_empty_tokens": {
          "type": "length",
          "min": 1
        },
        "replace_empty_with_null": {
          "type": "pattern_replace",
          "pattern": "^$",
          "replacement": "<NULL>"
        }
      },
      "analyzer": {
        "special_number_analyzer": {
          "type": "custom",
          "char_filter": [
            "strip_nondigits"
          ],
          "tokenizer": "keyword",
          "filter": [
            "remove_empty_tokens"
          ]
        },
        "special_number_analyzer_search": {
          "type": "custom",
          "char_filter": [
            "strip_nondigits"
          ],
          "tokenizer": "keyword",
          "filter": [
            "replace_empty_with_null"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "special_number_field": {
        "type": "text",
        "analyzer": "special_number_analyzer",
        "search_analyzer": "special_number_analyzer_search"
      }
    }
  }
}

POST /_bulk?refresh=true
{ "index": { "_index": "ds1"} }
{ "special_number_field": "1234" }

Relevant Logs or Screenshots:

Using this analyzer:

GET ds1/_analyze
{
  "text": "*asdf*",
  "analyzer": "special_number_analyzer"
}

correctly produces:

{
  "tokens": []
}

Using this search_analyzer:

GET ds1/_analyze
{
  "text": "*asdf*",
  "analyzer": "special_number_analyzer_search"
}

correctly produces:

{
  "tokens": [
    {
      "token": "<NULL>",
      "start_offset": 6,
      "end_offset": 6,
      "type": "word",
      "position": 0
    }
  ]
}

Running this query using the analyzer:

GET ds1/_search
{
  "query": {
    "query_string": {
      "query": "*asdf*",
      "analyzer": "special_number_analyzer"
    }
  }
}

produces this unwanted hit:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "ds1",
        "_id": "vB0QnJ0Bpkf8R5zRYeIl",
        "_score": 1,
        "_source": {
          "special_number_field": "1234"
        }
      }
    ]
  }
}

Running this query using the search_analyzer:

GET ds1/_search
{
  "query": {
    "query_string": {
      "query": "*asdf*",
      "analyzer": "special_number_analyzer_search"
    }
  }
}

produces the same unwanted hit:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "ds1",
        "_id": "vB0QnJ0Bpkf8R5zRYeIl",
        "_score": 1,
        "_source": {
          "special_number_field": "1234"
        }
      }
    ]
  }
}

@mmi This looks like a bug in the query_string wildcard handling. I suspect the analyzer might be running correctly and producing zero tokens (as expected for non-digit input like asdf), but there may not be proper validation afterwards to check if any tokens were actually produced.

Could you open a GitHub issue in the OpenSearch repo? If you do so, please share the link here for traceability.

Thank you Pablo for the quick response. As you suggested I opened the ticket in GitHub:

[BUG] Null values matched in query_string with asterisks · Issue #21280 · opensearch-project/OpenSearch

Kind regards.