[RFC] neural sparse models improvement plan

Hi @zhichao-aws,

Our main pain points are around ingestion throughput (1) and search latency (3). This is mainly due to the shared threadpool in OS.

Another one was 4b, as it was not obvious from the docs that the model cannot be deployed inside OS and we need to deploy in SageMaker.

Relevant forum threads with additional details:

Thanks for working on this!