Is the k-NN plugin based on Apache OpenNLP

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

2.4

Describe the issue:
Does K-NN plugin or any NLP features from OpenSearch use Apache OpenNLP code?

Configuration:

Relevant Logs or Screenshots:

What is the exact code you are looking for here? It can help us provide exact ans for this question.

I’m just doing a comparison between Elasticsearch and Opensearch for by boss. I’m not looking for any specific code. Just wondering if Opensearch uses Apache OpenNLP for any NLP features. I know that Elasticsearch and Opensearch uses Apache Lucene for keyword search. Was wondering if Opensearch relies heavily on Apache OpenNLP for semantic search.

For semantics search Opensearch provides the capabilities to load custom model of customers and also provides their own list if model which can be used. As per my initial investigation Opensearch is not using OpenNLP.

Refer for Semantics search: Neural Search plugin - OpenSearch documentation

And for ML model refer this: Pretrained models - OpenSearch documentation

And details on ML can be found here: About ML Commons - OpenSearch documentation

I’ve been working with OpenSearch’s neural plugin for the last three months. I have it working on my laptop. I also know how to train a BERT Transformer model and convert a Pytorch BERT Transformer model to torchscript. Thank you for sharing your investigation, though.

1 Like

So is your query answered? I didn’t get it from your last reply.

But its good to know that you are already using the neural plugin.

Just to add some context we moved to use Lucene’s implementation of kNN. Here is the RFC that describes the work that was done.

@dtaivpp one small correction, we added the Lucene implementation of K-NN as another engine in OpenSearch along with other K-NN engines like nmslib and fassis. Lucene K-NN was not added in OpenSearch core, it was added via K-NN Plugin. I know the issue( [RFC] Lucene based kNN search support in core OpenSearch) heading is misleading. Just a simple correction. :slight_smile:

1 Like

Ah that is great to know. Thank you for that :smiley:

Thank you @Navneet and @dtaivpp . This is great information. Thanks for taking the time to resolve my question.

@johnt Thanks for your interest on OpenSearch ML features. Hope you find our feature helpful. If you have some time, can you share some feedback for our feature? Like is the feature helpful? What we can do to make it better? What’s missing there for your use case ?

@ylwu

We are test driving the OpenSearch semantic search with different BERT models from sbert.net. We were delighted that OpenSearch 2.4 was released in November. We trained our models using Stanford’s SQUAD data set and loaded the Game of Thrones into OpenSearch for testing. We will eventually load our customer’s data into OpenSearch… We are in the process of selecting which engine to use but have noticed that faiss and nmslib have similar performance. We’ve also noticed that innerproduct is providing better search results than cosinesimil but this may change as we do more testing. We’ve also found that distill bert models such as multi-qa-distilbert-cos-v1 do not return as accurate results. I’ll have more to share once we do more testing and get approval to use customer data.

1 Like

Thanks @johnt for sharing these. Hope your following test goes well. Don’t hesitate to tell us if you meet any issue. We are glad to provide help.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.