How to use the did-you-mean feature in Chinese document search

MasaruFukazawa · November 21, 2023, 8:08am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

OpenSearch Version 2.9（use AWS OpenSearch Service ）

Describe the issue:

I would like to know how to use the did-you-mean feature in Chinese document search

We have implemented a search function for Chinese (Traditional).
The index structure is shown below.

[ Index Structure ].

PUT cht-index/_settings
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "tokenizer": "ik_max_word",
          "filter": ["asciifolding", "lowercase"],
          "char_filter": ["symbol_remover"]
        }
      },
      "normalizer": {
        "custom_normalizer": {
          "type": "custom",
          "filter": ["asciifolding", "lowercase"],
          "char_filter": ["symbol_remover"]
        }
      },
      "char_filter": {
        "symbol_remover": {
          "type": "mapping",
          "mappings": [
            "， => ",
            "。 => ",
            "、 => ",
            "「 => ",
            "」 => ",
            "『 => ",
            "』 => ",
            "… => ",
            "‧ => ",
            "－ => ",
            "（ => ",
            "） => ",
            "( => ",
            ") => ",
            "《 => ",
            "》 => ",
            "〈 => ",
            "〉 => ",
            "： => ",
            "； => ",
            ": => ",
            "; => ",
            "! => ",
            "? => ",
            "！ => ",
            "？ => ",
            ", => ",
            "｡ => ",
            "､ => ",
            "｢ => ",
            "｣ => ",
            "＜ => ",
            "＞ => ",
            "< => ",
            "> => "
          ]
        }
      }
    }
  }
}

PUT cht-index/_mapping
{
  "dynamic": "strict",
  "properties": {
    "title": {
      "type": "text",
      "analyzer": "custom_analyzer",
      "fields": {
        "keyword": {
          "type": "keyword",
          "normalizer": "custom_normalizer"
        }
      }
    },
    "content": {
      "type": "text",
      "analyzer": "custom_analyzer",
      "fields": {
        "keyword": {
          "type": "keyword",
          "normalizer": "custom_normalizer"
        }
      }
    }
  }
}

The analyzer uses ik_max_word.
We would like to provide did-you-mean as part of the search function.
It seems that it does not work well for Chinese.

If there is a way to implement did-you-mean functionality for Chinese (Traditional), please let us know.

Topic		Replies	Views
Search Engine for Books OpenSearch	0	91	June 5, 2024
Not all Hindi matches are being returned when using custom analyzer OpenSearch troubleshoot , index-management	0	9	October 25, 2024
Specific Languages Analysis OpenDistro discuss	1	781	February 7, 2023
Cannot find documents matching symbols Open Source Elasticsearch and Kibana troubleshoot	1	633	February 7, 2023
Collation support? OpenSearch troubleshoot , configure	3	767	May 17, 2022

How to use the did-you-mean feature in Chinese document search

Related topics