Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): 2.19.4
Describe the issue:
I used the index template below to create an index, along with a single document for testing. The document is indexed such that only the digits are kept; for example, indexing a value like “aaa123” would only retain the “123” portion.
Likewise, if a user searches for something like “zzz999”, the only token generated should be “999”.
This works for the most part. But if the user enters a query_string that contains an asterisk, then the document is returned regardless of whether it actually matches. For example, searching for:
*asdf*
produces a match with the document below, which has a value of 12345. This is in contrast to the two _analyze functions listed below, which produce tokens that don’t match each other.
I would be really grateful for any help that anyone could provide. Thanks in advance.
Configuration:
PUT ds1
{
"settings": {
"analysis": {
"char_filter": {
"strip_nondigits": {
"type": "pattern_replace",
"pattern": "\\D",
"replacement": ""
}
},
"filter": {
"remove_empty_tokens": {
"type": "length",
"min": 1
},
"replace_empty_with_null": {
"type": "pattern_replace",
"pattern": "^$",
"replacement": "<NULL>"
}
},
"analyzer": {
"special_number_analyzer": {
"type": "custom",
"char_filter": [
"strip_nondigits"
],
"tokenizer": "keyword",
"filter": [
"remove_empty_tokens"
]
},
"special_number_analyzer_search": {
"type": "custom",
"char_filter": [
"strip_nondigits"
],
"tokenizer": "keyword",
"filter": [
"replace_empty_with_null"
]
}
}
}
},
"mappings": {
"properties": {
"special_number_field": {
"type": "text",
"analyzer": "special_number_analyzer",
"search_analyzer": "special_number_analyzer_search"
}
}
}
}
POST /_bulk?refresh=true
{ "index": { "_index": "ds1"} }
{ "special_number_field": "1234" }
Relevant Logs or Screenshots:
Using this analyzer:
GET ds1/_analyze
{
"text": "*asdf*",
"analyzer": "special_number_analyzer"
}
correctly produces:
{
"tokens": []
}
Using this search_analyzer:
GET ds1/_analyze
{
"text": "*asdf*",
"analyzer": "special_number_analyzer_search"
}
correctly produces:
{
"tokens": [
{
"token": "<NULL>",
"start_offset": 6,
"end_offset": 6,
"type": "word",
"position": 0
}
]
}
Running this query using the analyzer:
GET ds1/_search
{
"query": {
"query_string": {
"query": "*asdf*",
"analyzer": "special_number_analyzer"
}
}
}
produces this unwanted hit:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "ds1",
"_id": "vB0QnJ0Bpkf8R5zRYeIl",
"_score": 1,
"_source": {
"special_number_field": "1234"
}
}
]
}
}
Running this query using the search_analyzer:
GET ds1/_search
{
"query": {
"query_string": {
"query": "*asdf*",
"analyzer": "special_number_analyzer_search"
}
}
}
produces the same unwanted hit:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "ds1",
"_id": "vB0QnJ0Bpkf8R5zRYeIl",
"_score": 1,
"_source": {
"special_number_field": "1234"
}
}
]
}
}