How to register sparse encoding model in AWS OpenSearch

Hi @xinyual,

Thanks for the answer!

We were wondering about the thread number for the following reasons:

  1. We switched to a larger machine, with more CPUs and as a result it seems the ingestion pipeline is sending more documents for inference, in parallel
  2. At first we used a CPU instance for inference and it was being underutilised as it was processing only X documents, where X seemed to be the number of CPUs in the OS instance
  3. We switched to a GPU instance for inference which is again underutilised. Indeed, it does the inference faster per document, but it seems like the bottleneck is still the ingestion pipeline which doesn’t send as many documents to the SageMaker instance as it is capable of handling