Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch Version 2.9(use AWS OpenSearch Service )
Describe the issue:
I would like to know how to use the did-you-mean feature in Chinese document search
We have implemented a search function for Chinese (Traditional).
The index structure is shown below.
[ Index Structure ].
PUT cht-index/_settings
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"tokenizer": "ik_max_word",
"filter": ["asciifolding", "lowercase"],
"char_filter": ["symbol_remover"]
}
},
"normalizer": {
"custom_normalizer": {
"type": "custom",
"filter": ["asciifolding", "lowercase"],
"char_filter": ["symbol_remover"]
}
},
"char_filter": {
"symbol_remover": {
"type": "mapping",
"mappings": [
", => ",
"。 => ",
"、 => ",
"「 => ",
"」 => ",
"『 => ",
"』 => ",
"… => ",
"‧ => ",
"- => ",
"( => ",
") => ",
"( => ",
") => ",
"《 => ",
"》 => ",
"〈 => ",
"〉 => ",
": => ",
"; => ",
": => ",
"; => ",
"! => ",
"? => ",
"! => ",
"? => ",
", => ",
"。 => ",
"、 => ",
"「 => ",
"」 => ",
"< => ",
"> => ",
"< => ",
"> => "
]
}
}
}
}
}
PUT cht-index/_mapping
{
"dynamic": "strict",
"properties": {
"title": {
"type": "text",
"analyzer": "custom_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "custom_normalizer"
}
}
},
"content": {
"type": "text",
"analyzer": "custom_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "custom_normalizer"
}
}
}
}
}
The analyzer uses ik_max_word.
We would like to provide did-you-mean as part of the search function.
It seems that it does not work well for Chinese.
If there is a way to implement did-you-mean functionality for Chinese (Traditional), please let us know.