Term Vectors, stemmers, tokenizers, stop words etc

orestis · October 6, 2021, 6:42pm

Last year I had a pilot project to use ElasticSearch to power a full-text search project. I was using 7.7 at the time, on AWS ElasticSearch service.

I’ve created custom analysers as described here Language analyzers | Elasticsearch Guide [8.4] | Elastic, I’ve used filters like html_strip, and I was planning of using the Term Vectors API Term vectors API | Elasticsearch Guide [8.4] | Elastic …

Now that I’m revisiting the project, I can’t find references to any of these in the OpenSearch documentation, though if it is a 7.10 fork those features should be there. Is this an oversight in the documentation or are there differences between the projects?

In general I’m a bit worried as all the docs behind OpenSearch are focused on logs ingestion and there’s not too many examples of the text analysis capabilities. It would help me a lot make a decision on whether to base this project on OpenSearch or if I should go with an Elastic Cloud license.

Thanks!

kris · October 15, 2021, 6:52pm

Hello @orestis - welcome to the community. As you mentioned, yes, it is derived from 7.10.2 . However, we did not fork the documentation at the time. The team is working diligently building necessary content for the documentation, and we do track that in the open as well on the GitHub repository. Here is direct link to the backlog. I hope this helps.

orestis · October 15, 2021, 7:31pm

Thanks for clarifying. I thought it might be a documentation issue. I guess until the docs are rewritten (they weren’t under the same license? huh) I can use the existing ElasticSearch 7.10.2 docs for some things.

Topic		Replies	Views
Built-in Filters and Tokenizers Available in OpenSearch from ElasticSearch OpenSearch	4	921	July 11, 2022
Plugins for Opensearch OpenDistro	5	2169	February 15, 2024
Dense_vector support Open Source Elasticsearch and Kibana	2	2718	September 16, 2021
Mapping - Fields Type English or Custom Analyzers OpenSearch discuss	13	2009	June 13, 2022
Vector.dev observability data pipelines OpenSearch Client Libraries clients-general	1	1054	February 7, 2023

Term Vectors, stemmers, tokenizers, stop words etc

Related topics