Hi all, I’m a contributor for the neural sparse feature. Now we have several directions for improving the neural sparse feature, and we want to collect some feedback directly from our users.
Here are several directions to improve neural sparse features,
- Smaller model size and faster model inference. This will increase neural sparse model ingestion throughput, and speed up the bi-encoder search
- Better search relevance.
- Speeds up the end2end search latency
- Easy to use improvements.
a. code demos to use the neural sparse ingest/search end2end
b. code demos to deploy neural sparse models at GPU endpoints (SageMaker or self-hosted text encoding).
c. code demos to fine-tune neural sparse at customized dataset
d. provide one-click API to deploy the models and setup ingest/search pipelines
- provide multi-lingual sparse-encoding models
- provide multi-modal sparse-encoding models
We want to collect feedback from our users and this help us prioritize our work. Could you please leave your comments about these improvements?
1 Like
Hi @zhichao-aws,
Our main pain points are around ingestion throughput (1) and search latency (3). This is mainly due to the shared threadpool in OS.
Another one was 4b, as it was not obvious from the docs that the model cannot be deployed inside OS and we need to deploy in SageMaker.
Relevant forum threads with additional details:
Thanks for working on this!
Hi @grunt-solaces.0h , for ingestion throughput, we’re training models with 0.5x parameters and 0.25x parameters with enhanced pretraining procedures. Now we’re seeing primitive results and I believe we can see these new models in a few months.
And after the ml-commons throttling issues get fixed, we can achieve the high throughput with large batch size
For search latency, we strongly recommend to upgrade to AOS 2.13, which is GA several days ago. The retrieval latency on inverted index gets speed up at large margin, and we’ll also publish a blog to tell the speed up and do some quantative analysis. With AOS 2.13, we can also deploy the model inside OS with API like this:
POST /_plugins/_ml/models/_register
{
"name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1",
"version": "1.0.1",
"model_format": "TORCH_SCRIPT"
}
In 2.15 we plan to onboard the neural sparse 2-phase search, which can speeds up the bi-encoder raw sparse vector retrieval time 4x~8x in our experiments.
hi @malv007 @jakabasej5 I created a repo to put some sample codes. Now I’ve put the sample using chunking processor + neural sparse on it. You can cut a issue to request for other samples.
For the ML tools, I created a issue in skills repo: [RFC] support customized search query tool · Issue #318 · opensearch-project/skills · GitHub. You can provide your comments/feedback on the issue so we can prioritize it in the next release. Thanks!