GPU Acceleration not working on GCP GKE GPUs

Nagpraveen · July 31, 2025, 9:40am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Opensearch: 3.1

GKE cluster

Kernel version

6.6.87+

OS image

Container-Optimized OS from Google

Container runtime version

containerd://1.7.27

kubelet version

v1.32.4-gke.1698000

kube-proxy version

v1.32.4-gke.1698000

Describe the issue:

Opensearch is freshly installed with 3.1 on a GKE cluster
The data and ml roles are assigned to the common nodes and is tageed with a GPU. GPU machine is g2-standard-24 with nvidia-l4 accelerator
for vector neural search, using FAISS engine with HNSW graph configuration
The GPU remains idle throughout indexing and searching – no gpu utilisation during HNSW graph construction or run time inferencing

Configuration:

1) GPU - g2-standard-24

2) Tried with 2 models, one with onnx and one with pytorch.

3) FAISS - cosine simil with hnsw graph for knn

nvidia - uvm is installed as mentioned here:

GPU acceleration - OpenSearch Documentation

Relevant Logs or Screenshots:

Navneet · August 6, 2025, 7:16am

@Nagpraveen Looking at the SS attached I think you added a GPU node in the cluster as a data node. This is now how GPU based index acceleration works. GPU nodes are outside of the cluster. Some references for design: [RFC] Remote Vector Index Build Feature with OpenSearch Vector Engine · Issue #2294 · opensearch-project/k-NN · GitHub , https://github.com/opensearch-project/k-NN/issues/2293

On high level what you need to do is, have a GPU machine running with the image having remote vector index build, here is the user guide.
Once that is done you need to allow OpenSearch cluster to talk to that machine by enabling few settings here: Settings - OpenSearch Documentation which will ensure that your GPU fleet for index build is ready to take the index build request.

Another thing to note is GPU based acceleration is only available for Indexing. Searches still happen on the data nodes.

Please let me know if you have more questions. I would be happy to help here.

pablo · August 6, 2025, 8:52am

@Nagpraveen I’ve got this working for ingest and search.

I used this Dockerfile to build an OpenSearch image with PyTorch. This works with Nvidia GeForce 3060 12 GB

.env

PYTORCH_VERSION=2.5.1
OS_VER=3.0.0

Dockerfile

###############################################################################
# ---- build-time arguments you’ll most likely tweak --------------------------
###############################################################################
ARG OS_VER          # target OpenSearch release
ARG PYTORCH_VERSION             # target PyTorch release (GPU build)

###############################################################################
# ---- stage 1: pull the official OpenSearch binary bundle --------------------
###############################################################################
FROM public.ecr.aws/opensearchproject/opensearch:${OS_VER} AS source

###############################################################################
# ---- stage 2: CUDA / PyTorch runtime image ----------------------------------
###############################################################################
# Pick a pytorch/pytorch tag that matches the torch version *and*
# a CUDA toolchain you have on the host.  CUDA 12.1 + cuDNN 8 works well
# with recent nvidia-driver 535+.
FROM pytorch/pytorch:${PYTORCH_VERSION}-cuda12.4-cudnn9-devel

###############################################################################
# ---- basic OS / user setup --------------------------------------------------
###############################################################################
ARG UID=1000
ARG GID=1000
ARG OPENSEARCH_HOME=/usr/share/opensearch

RUN addgroup --gid ${GID} opensearch \
 && adduser  --uid ${UID} --gid ${GID} --home ${OPENSEARCH_HOME} opensearch

###############################################################################
# ---- Python extras you need in the container --------------------------------
###############################################################################
# transformers ≥ 4.41 works with Torch 2.7; pin if you like
RUN pip install --no-cache-dir transformers

###############################################################################
# ---- install OS-level dependencies (e.g., curl) -----------------------------
###############################################################################
USER root

RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*

###############################################################################
# ---- copy the OpenSearch distribution from the helper stage -----------------
###############################################################################
COPY --from=source --chown=${UID}:${GID} ${OPENSEARCH_HOME} ${OPENSEARCH_HOME}
WORKDIR ${OPENSEARCH_HOME}

###############################################################################
# ---- expose Java & OpenSearch CLI on the default PATH -----------------------
###############################################################################
# The tarball still ships its own JDK (now version 21).  Keep the same layout.
RUN echo "export JAVA_HOME=${OPENSEARCH_HOME}/jdk"             >  /etc/profile.d/java_home.sh \
 && echo "export PATH=\$PATH:\$JAVA_HOME/bin"                 >> /etc/profile.d/java_home.sh

ENV JAVA_HOME=${OPENSEARCH_HOME}/jdk
ENV PATH=$PATH:${JAVA_HOME}/bin:${OPENSEARCH_HOME}/bin

# k-NN native library path (needed for FAISS / cuVS)
ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${OPENSEARCH_HOME}/plugins/opensearch-knn/lib"

###############################################################################
# ---- switch to non-root, perform one-time setup, keep optional flags --------
###############################################################################
USER ${UID}

# • Disable the demo security configuration during the image build  
# • Leave the security plugin in place (default) so you can choose at runtime
#   whether to disable it with `DISABLE_SECURITY_PLUGIN=true`
ARG DISABLE_INSTALL_DEMO_CONFIG=true
ARG DISABLE_SECURITY_PLUGIN=false
RUN ./opensearch-onetime-setup.sh

###############################################################################
# ---- network ports ----------------------------------------------------------
###############################################################################
EXPOSE 9200 9300 9600 9650

###############################################################################
# ---- metadata & start-up ----------------------------------------------------
###############################################################################
ARG BUILD_DATE
ARG NOTES
LABEL org.opencontainers.image.created=$BUILD_DATE \
      org.opencontainers.image.version=$OS_VER \
      org.opencontainers.image.description="OpenSearch ${OS_VER} with PyTorch ${PYTORCH_VERSION} GPU runtime" \
      org.opencontainers.image.notes="$NOTES"

ENTRYPOINT ["./opensearch-docker-entrypoint.sh"]
CMD ["opensearch"]

Nagpraveen · August 6, 2025, 1:44pm

@Navneet @pablo : thank you for covering the solutions at 2 different spectrums::

I have few questions on the same:

@Navneet : Thank you for the correction and all the references. When building the GPU fleet and powering the index build service on that GPU component - should that part of same opensearch cluster with a different role or completely out of opensearch cluster - Just as a remote Pod?
And is it necessary to have an intermediate Object storage like S3 or GCS or DB etc., or can it happen on the fly and return back to KNN on the Opensearch data/ml nodes.

Also, The embeddings that are passed to the service to build HNSW graph, can be pre generated already correct?

– Thanks..

@pablo : Thank you providing an interesting solution via docker where the same GPU data/ml node that can leverage GPU for indexing and searching.

Can I have an equivalent values.yaml post building the custom image using the above docker file - If I am running this on GKE cluster
How can I understand the compatible machines on GCP for GPU which works well with NVIDIA?
Were there any significants improvements in the indexing and search latency for dense embeddings?
Do i need to have only .pt version of the model during query time inferencing or is onnx version is compatible as well?

These info would be helpful to understand and proceed further.

Thanks a ton!

Navneet · August 7, 2025, 6:15pm

It should be just be a remote pod with the docker image: https://hub.docker.com/r/opensearchproject/remote-vector-index-builder. user guide: remote-vector-index-builder/USER_GUIDE.md at main · opensearch-project/remote-vector-index-builder · GitHub

The k-NN plugin will upload the intermediate object. You just need to configure the repository-s3 .

Also, The embeddings that are passed to the service to build HNSW graph, can be pre generated already correct?

No you have to give it to OpenSearch, just like normal indexing. The accelerate is happening in background and the vectors will uploaded by k-NN plugin

pablo · August 7, 2025, 7:04pm

I didn’t test that with any Kubernetes version.

Unfortunately, I don’t have much experience with cloud GPU VMs.

My testing was very shallow as I was only testing the possibility of using a GPU with OpenSearch. I used some sample data (10000 documents) with ingest and search.

I’ve tested only with TorchScript artifacts.

Nagpraveen · September 19, 2025, 4:57am

@pablo : Thanks for the reply.

I tried building the custom image for Data/ML nodes like the above one but doesnt seem to work. Would be good to wait for inbuilt support from Opensearch for data and ml nodes with few config changes.
GPU fleet isnt a feasible option as the it costs so much in addition to the data and ml nodes cpus. Offline embedding generation and ingestion works better. ofcourse the GPU fleet might help save couple of mins during indexing, but is ignorable when looked at the cost spent.

This is my opinion, please correct if I am wrong. @Navneet

Thanks!

Topic		Replies	Views
Use GPU acceleration for pipeline with text_embedding Machine Learning	3	811	April 21, 2024
Can Faiss run in GPU-mode? k-NN discuss	4	1136	April 16, 2023
Performance between OpenSearch installation by docker, VM and pyhsical machine? OpenSearch discuss	6	2021	June 21, 2022
Opensearch JAR configure with Google Data Fusion Open Source Elasticsearch and Kibana	1	263	February 7, 2023
Concept : Apply GPU acceleration to ML Node in k8s operator Machine Learning discuss , troubleshoot , configure , install , feature-request	1	54	January 14, 2025

GPU Acceleration not working on GCP GKE GPUs

Related topics