On high level what you need to do is, have a GPU machine running with the image having remote vector index build, here is the user guide.
Once that is done you need to allow OpenSearch cluster to talk to that machine by enabling few settings here: Settings - OpenSearch Documentation which will ensure that your GPU fleet for index build is ready to take the index build request.
Another thing to note is GPU based acceleration is only available for Indexing. Searches still happen on the data nodes.
Please let me know if you have more questions. I would be happy to help here.
@Navneet@pablo : thank you for covering the solutions at 2 different spectrums::
I have few questions on the same:
@Navneet : Thank you for the correction and all the references. When building the GPU fleet and powering the index build service on that GPU component - should that part of same opensearch cluster with a different role or completely out of opensearch cluster - Just as a remote Pod?
And is it necessary to have an intermediate Object storage like S3 or GCS or DB etc., or can it happen on the fly and return back to KNN on the Opensearch data/ml nodes.
Also, The embeddings that are passed to the service to build HNSW graph, can be pre generated already correct?
– Thanks..
@pablo : Thank you providing an interesting solution via docker where the same GPU data/ml node that can leverage GPU for indexing and searching.
Can I have an equivalent values.yaml post building the custom image using the above docker file - If I am running this on GKE cluster
How can I understand the compatible machines on GCP for GPU which works well with NVIDIA?
Were there any significants improvements in the indexing and search latency for dense embeddings?
Do i need to have only .pt version of the model during query time inferencing or is onnx version is compatible as well?
These info would be helpful to understand and proceed further.
The k-NN plugin will upload the intermediate object. You just need to configure the repository-s3 .
Also, The embeddings that are passed to the service to build HNSW graph, can be pre generated already correct?
No you have to give it to OpenSearch, just like normal indexing. The accelerate is happening in background and the vectors will uploaded by k-NN plugin
Unfortunately, I don’t have much experience with cloud GPU VMs.
My testing was very shallow as I was only testing the possibility of using a GPU with OpenSearch. I used some sample data (10000 documents) with ingest and search.
I tried building the custom image for Data/ML nodes like the above one but doesnt seem to work. Would be good to wait for inbuilt support from Opensearch for data and ml nodes with few config changes.
GPU fleet isnt a feasible option as the it costs so much in addition to the data and ml nodes cpus. Offline embedding generation and ingestion works better. ofcourse the GPU fleet might help save couple of mins during indexing, but is ignorable when looked at the cost spent.
This is my opinion, please correct if I am wrong. @Navneet