[Feedback] Neural Search plugin - experimental release

We are releasing neural-search as an experimental feature with OpenSearch 2.4. The feature is created to enable neural embedding based search in OpenSearch. It provides the capability for indexing documents and doing neural search on the indexed documents. More details about the plugin can be found in this RFC: [RFC] OpenSearch neural-search plugin · Issue #11 · opensearch-project/neural-search · GitHub.

To use the plugin, you can download any pre-trained language model (e.g. from huggingface.co) and upload to OpenSearch. Meanwhile we’ll also publish a GPT model, along with a toolkit script, so you can fine tune your language model with your own corpus of data.

As mentioned, this feature is experimental as of now. We’d love to gather some feedback from you on this feature, to help us understand your need around it, and plan our next steps accordingly. Any comments/suggestions will be highly appreciated!

1 Like

This is good news. It is very difficult to give useful feedback before giving it a try. Though, I the following comments:

  1. Does it support highlighting? Having neural search query without highlight in is most cases going to be misleading for users. Highlighting options can be based on QA models or it can return the most similar sentence.

  2. Ingestion model should be auto-registered for a neural attribute so that querying and inference over that attribute will always be consistent.

  3. A clean documentation on how to include external model (huggingface for instance) is going to be crucial.

  4. Are there any list of supported/unsupported models?

I look forward to trying it.

Also, when is it expected to exit the experimental stage and become a main official feature?

I got an error with the example in the documentation. Below is the code I tried. I tested the model and it is loading and working fine. However, ingesting the document fails giving the below error!

Environment: This was done on windows in development mode (one node as cluster_manager, data, ingest and ml)

Error:
{
“error” : {
“root_cause” : [
{
“type” : “illegal_argument_exception”,
“reason” : “empty docs”
}
],
“type” : “illegal_argument_exception”,
“reason” : “empty docs”
},
“status” : 400
}

My Code:
POST /_plugins/_ml/models/_upload
{
“name”: “all-MiniLM-L6-v2”,
“version”: “1.0.0”,
“description”: “test model”,
“model_format”: “TORCH_SCRIPT”,
“model_config”: {
“model_type”: “bert”,
“embedding_dimension”: 384,
“framework_type”: “sentence_transformers”
},
“url”: “https://github.com/ylwu-amzn/ml-commons/blob/2.x_custom_m_helper/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/text_embedding/all-MiniLM-L6-v2_torchscript_sentence-transformer.zip?raw=true
}

POST /_plugins/_ml/models/kI6NhoQB3oLQzIJTkldg/_load

POST /_plugins/_ml/models/kI6NhoQB3oLQzIJTkldg/_predict
{
“text_docs”:[ “today is sunny”]
}

PUT _ingest/pipeline/nlp-pipeline
{
“description”: “An example neural search pipeline”,
“processors” : [
{
“text_embedding”: {
“model_id”: “kI6NhoQB3oLQzIJTkldg”,
“field_map”: {
“text”: “text_knn”
}
}
}
]
}

PUT /my-nlp-index-1
{
“settings”: {
“index.knn”: true,

    "default_pipeline": "nlp-pipeline"
},
"mappings": {
    "properties": {
        "passage_embedding": {
            "type": "knn_vector",
            "dimension": 384,
            "method": {
                "name": "hnsw",
                "space_type": "l2",
                "engine": "nmslib",
                "parameters": {
                  "ef_construction": 128,
                  "m": 24
                }
            }
        },
        "passage_text": { 
            "type": "text"            
        }
    }
}

}

POST my-nlp-index-1/_doc
{
“passage_text”: “Hello world”
}

Highlighting is a very interesting feature but something that the current neural search method does not include. However, as has been shown in the literature neural models can be used purely for search and still provide a boost (when compared to BM25) in terms of search relevance. An interesting example can be found here OpenSearch 2.4.0 is available today! · OpenSearch

Detailed documentation on recommended models and combination strategies with BM25 will be released soon. We will also be releasing benchmarking results of several public models (and custom models) on challenge datasets from the BEIR benchmark. In particular, neural models when combined with BM25 yield a 15% boost on several datasets from the BEIR challenge.

Currently, all of the supported models belong to the Sentence transformers class.

Note that our recommendations are guided by the best empirical performance of models in a zero shot setting i.e. models that perform best on data distributions that it has not been trained on. And we measure search relevance in terms of accuracy, mean average precision and nDCG@10.

Looking at the input provided, the mapping in the nlp-pipeline is not correct. So the processor take field map as an input which is basically saying which field needs to be converted to vectors. The fields provided in the nlp pipeline and the one provided in input is not matching.

To fix this, please update your fields in the pipeline like this:

PUT _ingest/pipeline/nlp-pipeline
{
    "description": "An example neural search pipeline",
    "processors": [
        {
            "text_embedding": {
                "model_id": "kI6NhoQB3oLQzIJTkldg",
                "field_map": {
                    "passage_text": "passage_embedding"
                }
            }
        }
    ]
}

Moreover looking at the output provided, more details can be provided. I am cutting a github issue to fix the error response.

Also seems like you followed the documentation and I can see it as broken. Will cut a github issue for the same.

It works now. However, It does not allow me to insert documents in the index if I am not utilizing passage_text and passage_embedding fields. For example the below returns empty docs error. This breaks the normal behaviour of OpenSearch. So, if I have large index containing some documents with passage_text and others without then we would not be able to insert the documents without passage_text. In addition, it also breaks the update API below. I think that embedding fields should be an optional field that can be inserted/updated and it should not affect the normal behaviour of existing APIs.

PUT my-nlp-index-1/_doc/2
{
“name”:“doc1”
}

POST my-nlp-index-1/_update/2
{
“doc”: {
“name”: “new_name”
}
}

One way to around this is by passing “ignore_failure:true” in the pipeline. This will fix the issue for now. In the meanwhile we are coming up with a permanent fix for this. Will post the github issue to track the fix.

Github issue: IllegalArgumentException when all embedding fields not shown or doing a partial update without embedding fields · Issue #73 · opensearch-project/neural-search · GitHub

Question: How is the embedding generation going to be done for large documents? As you know that BERT-based models are limited to a model size which only applies on the first tokens and omit the rest. So if the model size is 384 and the document is a 1000 tokens then words within these tokens will be missed and thus the embedding will not be that reliable.

Is there any chunking happening behind the scene or do you rely on the user to do the chunking prior to ingestion? If it is the latter then what kind of chunking do you recommend?

From a user perspective, he would want to search for a phrase and gets a contextually matching document regardless of where the context is located in the text or how large the text is. Also, a highlighted result will be more useful.

Question: How is the embedding generation going to be done for large documents? As you know that BERT-based models are limited to a model size which only applies on the first tokens and omit the rest. So if the model size is 384 and the document is a 1000 tokens then words within these tokens will be missed and thus the embedding will not be that reliable.
Is there any chunking happening behind the scene or do you rely on the user to do the chunking prior to ingestion? If it is the latter then what kind of chunking do you recommend?

No there is no chunking happening behind the scene, user need to do this chunking before ingesting the documents in the cluster. You might want to check this response on the RFC : [RFC] OpenSearch neural-search plugin · Issue #11 · opensearch-project/neural-search · GitHub , [RFC] OpenSearch neural-search plugin · Issue #11 · opensearch-project/neural-search · GitHub
to see what works for you.

Neural search depends on models running in ml-commons model serving framework (refer to Model-serving framework - OpenSearch documentation). Actually model serving framework will create Huggingface tokenizer with tokenizer.json file which includes truncation logic for large document. You can tune the tokenizer json file to control how to truncate.