Extending Neural Search pipeline to Named entity recognition and other metadata extracting models

I have a usecase to involve a named entity recognition model for documents and queries while indexing and querying. The documents will be filtered based on the presence of extracted entities against the query’s extracted entities. The pipeline will work similar to the existing neural search pipeline with one difference that in this usecase, the queries and documents will be passed through a NER (Named entity recogntion) model and added with extra metadata such as entities instead of vectors provided by an embedding model.

So if we are able to extend the usecase of neural-search pipeline to include model(s) that enable named entities extraction, embeddings, image segments (finding image components for image search) etc., so that the query/document extracts enough metadata through various models in the list of my neural search pipeline before matching.

Hi Praveen,
Thanks for reaching out. This seems like a feature request. I am creating a github issue for this in the Neural Search plugin repo. Please respond on that issue with more details about the use case and we can continue the discussion on github.

Github issue: Extending Neural Search pipeline to Named entity recognition and other metadata extracting models · Issue #134 · opensearch-project/neural-search · GitHub

Hi Praveen,

thanks, for your feedback. We are planning to extend our support for ML models beyond the text embedding models that we currently support.

For your NER use case, are you only interested in support for self managed NER model, or do you also have interest in be able to use a 3rd-party API or managed service that can provide you with NER functionality as a service?


@dylan ,

Thanks for the response. I would say supporting multiple types here will be useful.

  1. Custom models - User has his/her own model, upload it to OpenSearch (ml-commons) and use it as a part of search/indexing pipeline.
  2. 3rd party models - User wants to call a hugging face/sagemaker model through an api as a part of the pipeline.
  3. Managed service - Fully managed Services like Amazon Comprehend (for text) or Amazon Rekognition (for images) can be triggered to extract metadata for the docs/queries in the pipeline.


@Praveen, thanks for the feedback. We have plans to support all these scenarios as part of the ML framework that we’re actively improving on the ml-commons plugin. Stay tune!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.