Improving OpenSearch relevance with better normalization (beyond stemming)

TonyJ · April 19, 2026, 7:29pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Describe the issue:

In several OpenSearch projects we’ve seen stemming introduce noise early in the pipeline, especially in multilingual setups.

For example:

“organization” → “organ”
“news” → “new”
“united” → “unit”

These kinds of transformations collapse unrelated terms into the same form, which affects matching quality and introduces noise into the index.

In practice, this often leads to:

less precise retrieval
more reliance on query-side complexity (ngrams, fuzzy, etc.)
inconsistent behavior across languages

We’ve been exploring an alternative approach using a lightweight plugin that adds proper lemmatization and decompounding before indexing. It’s simple to integrate and doesn’t require changes to query logic.

So far, we’re seeing improvements in:

lexical matching
index quality
consistency (also when combined with semantic search)

More detailed examples here:
https://www.linkedin.com/pulse/how-increase-search-relevance-elasticsearch-better-text-tony-chac%C3%B3n-arkic

Curious how others are handling this kind of issue.

Configuration:

OpenSearch (standard analyzers with stemming)
Multilingual datasets
Combination of lexical and semantic search in some cases

Relevant Logs or Screenshots:

TonyJ · April 20, 2026, 3:02pm

One thing we’re seeing is that many teams compensate with ngrams/fuzzy or move to semantic but the normalization layer is still noisy.

Curious if anyone has tried improving normalization before indexing instead?

Topic		Replies	Views
Search Engine for Books OpenSearch	0	125	June 5, 2024
Configure Tokenizers and Analyzers for easy searching OpenSearch	2	1738	June 27, 2022
How to convert elastic search index schema into opensearch index schema Open Source Elasticsearch and Kibana troubleshoot	3	1071	May 29, 2023
Synonym strategy for fractional, decimal, and unit variations of the same measurement OpenSearch	2	36	February 24, 2026
Multi-lingual search Request For Comments	2	977	August 25, 2022

Improving OpenSearch relevance with better normalization (beyond stemming)

Related topics