Extending Neural Search pipeline to Named entity recognition and other metadata extracting models

Praveen · March 13, 2023, 1:20pm

I have a usecase to involve a named entity recognition model for documents and queries while indexing and querying. The documents will be filtered based on the presence of extracted entities against the query’s extracted entities. The pipeline will work similar to the existing neural search pipeline with one difference that in this usecase, the queries and documents will be passed through a NER (Named entity recogntion) model and added with extra metadata such as entities instead of vectors provided by an embedding model.

So if we are able to extend the usecase of neural-search pipeline to include model(s) that enable named entities extraction, embeddings, image segments (finding image components for image search) etc., so that the query/document extracts enough metadata through various models in the list of my neural search pipeline before matching.

Navneet · March 13, 2023, 6:08pm

Hi Praveen,
Thanks for reaching out. This seems like a feature request. I am creating a github issue for this in the Neural Search plugin repo. Please respond on that issue with more details about the use case and we can continue the discussion on github.

Github issue: Extending Neural Search pipeline to Named entity recognition and other metadata extracting models · Issue #134 · opensearch-project/neural-search · GitHub

dylan · March 17, 2023, 6:13pm

Hi Praveen,

thanks, for your feedback. We are planning to extend our support for ML models beyond the text embedding models that we currently support.

For your NER use case, are you only interested in support for self managed NER model, or do you also have interest in be able to use a 3rd-party API or managed service that can provide you with NER functionality as a service?

Regards,
-Dylan

Praveen · March 18, 2023, 2:37pm

@dylan ,

Thanks for the response. I would say supporting multiple types here will be useful.

Custom models - User has his/her own model, upload it to OpenSearch (ml-commons) and use it as a part of search/indexing pipeline.
3rd party models - User wants to call a hugging face/sagemaker model through an api as a part of the pipeline.
Managed service - Fully managed Services like Amazon Comprehend (for text) or Amazon Rekognition (for images) can be triggered to extract metadata for the docs/queries in the pipeline.

Regards,
Praveen.

dylan · March 28, 2023, 11:28pm

@Praveen, thanks for the feedback. We have plans to support all these scenarios as part of the ML framework that we’re actively improving on the ml-commons plugin. Stay tune!

system · May 27, 2023, 11:29pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[Feedback] Neural Search plugin - experimental release General Feedback releases	42	3563	July 18, 2023
[Feedback] Machine Learning Model Serving Framework - Experimental Release General Feedback releases	48	2946	July 12, 2023
Building Applications with Neural Searches using Gen AI and OpenSearch Machine Learning	3	277	February 12, 2024
[Feedback] Conversational Search and Retrieval Augmented Generation Using Search Pipeline - Experimental Release General Feedback discuss	12	1533	March 30, 2024
Neural search text_embedding pipeline error (null_pointer_exception) Machine Learning troubleshoot	2	528	April 20, 2024

Extending Neural Search pipeline to Named entity recognition and other metadata extracting models

Related topics