[RFC] neural sparse models improvement plan

zhichao-aws · May 8, 2024, 8:05am

Hi all, I’m a contributor for the neural sparse feature. Now we have several directions for improving the neural sparse feature, and we want to collect some feedback directly from our users.

Here are several directions to improve neural sparse features,

Smaller model size and faster model inference. This will increase neural sparse model ingestion throughput, and speed up the bi-encoder search
Better search relevance.
Speeds up the end2end search latency
Easy to use improvements.
a. code demos to use the neural sparse ingest/search end2end
b. code demos to deploy neural sparse models at GPU endpoints (SageMaker or self-hosted text encoding).
c. code demos to fine-tune neural sparse at customized dataset
d. provide one-click API to deploy the models and setup ingest/search pipelines
provide multi-lingual sparse-encoding models
provide multi-modal sparse-encoding models

We want to collect feedback from our users and this help us prioritize our work. Could you please leave your comments about these improvements?

malv007 · May 9, 2024, 1:37pm

Other suggestions:

Include a code demo using text chunking and sparse vectors in nested fields
Make sure ML tools (RAG tool, etc.) support sparse vectors in nested fields

Thanks so much!

grunt-solaces.0h · May 23, 2024, 8:25am

Hi @zhichao-aws,

Our main pain points are around ingestion throughput (1) and search latency (3). This is mainly due to the shared threadpool in OS.

Another one was 4b, as it was not obvious from the docs that the model cannot be deployed inside OS and we need to deploy in SageMaker.

Relevant forum threads with additional details:

model registration / deployment and bit of performance detailas: How to register sparse encoding model in AWS OpenSearch - #15 by darvel
ingestion and search performance: How to scale neural sparse ingestion pipeline

Thanks for working on this!

zhichao-aws · May 24, 2024, 6:08am

Hi @grunt-solaces.0h , for ingestion throughput, we’re training models with 0.5x parameters and 0.25x parameters with enhanced pretraining procedures. Now we’re seeing primitive results and I believe we can see these new models in a few months.
And after the ml-commons throttling issues get fixed, we can achieve the high throughput with large batch size

For search latency, we strongly recommend to upgrade to AOS 2.13, which is GA several days ago. The retrieval latency on inverted index gets speed up at large margin, and we’ll also publish a blog to tell the speed up and do some quantative analysis. With AOS 2.13, we can also deploy the model inside OS with API like this:

POST /_plugins/_ml/models/_register
{
    "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1",
    "version": "1.0.1",
    "model_format": "TORCH_SCRIPT"
}

In 2.15 we plan to onboard the neural sparse 2-phase search, which can speeds up the bi-encoder raw sparse vector retrieval time 4x~8x in our experiments.

jakabasej5 · May 28, 2024, 12:25am

indeed thanks!

zhichao-aws · June 6, 2024, 9:48am

hi @malv007 @jakabasej5 I created a repo to put some sample codes. Now I’ve put the sample using chunking processor + neural sparse on it. You can cut a issue to request for other samples.

For the ML tools, I created a issue in skills repo: [RFC] support customized search query tool · Issue #318 · opensearch-project/skills · GitHub. You can provide your comments/feedback on the issue so we can prioritize it in the next release. Thanks!

zhichao-aws · July 17, 2024, 2:49am

Hi @malv007 @jakabasej5 , we have supported the nested field in neural sparse tool, vectorDB tool and RAGTool. [Feature] support nested query in neural sparse tool, vectorDB tool and RAG tool by zhichao-aws · Pull Request #350 · opensearch-project/skills · GitHub
It will be available in the 2.16 release

Topic		Replies	Views
How to scale neural sparse ingestion pipeline OpenSearch	3	323	May 7, 2024
Model weights for sparse encoders Machine Learning	20	600	February 9, 2024
[Feedback] Neural Search plugin - experimental release General Feedback releases	42	3614	July 18, 2023
The new sparse encoding model is not deployable Machine Learning	4	85	October 30, 2024
[Feedback] Machine Learning Model Serving Framework - Experimental Release General Feedback releases	48	2996	July 12, 2023

[RFC] neural sparse models improvement plan

Related topics