Built-in Filters and Tokenizers Available in OpenSearch from ElasticSearch

Hello -
Older versions (7.xx) of ElasticSearch have lots of built-in tokenizers and filters. However, OpenSearch documentation doesn’t mention the same list. I was wondering whether there is a list of built-in Tokenizers and Filters for OpenSearch, so that we can determine our move to OpenSearch.

I am particularly interested in the N-Gram, Edge N-Gram

Thanks,
Manu

So its funny I am literally looking for the same thing now… It seems like there isn’t a comprehensive list at the moment but here was an example I found of an edge-ngram filter being used.

Yes, all those token filters exist. The documentation is not great, but you can always refer to Elastic’s 7.10 documentation which is the version OpenSearch originated from.

https://www.elastic.co/guide/en/elasticsearch/reference/7.10/analysis-tokenfilters.html

I hope I didn’t step on anyone’s toes, but I took the liberty of filing https://github.com/opensearch-project/documentation-website/issues/790

That’s definitely something we should document. Thanks for letting us know!

Thank you all for your suggestions!

@nateynate thanks for filing the request. It will be great to have a documentation