Use GPU acceleration for pipeline with text_embedding

Pigau · February 9, 2024, 10:31am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

OpenSearch 2.11.0
EC2 g4dn.xlarge with AMI “AWS Deep Learning Base AMI GPU CUDA 11 (Ubuntu 20.04) 20230403” (ami-0fa79b4479d6a7310)
Docker 23.0.2
Nvidia driver : NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0

Describe the issue:

Hello everyone,

As the title mentions, I’m trying to use GPU acceleration for ingest pipeline that uses text_embedding. The indexation works fine, the embedding vectors are calculated correctly but the GPU is not doing any things during the whole process.

Configuration:

My application runs in a docker compose stack on an EC2 instance.
So with the help of this documentation i configured two nodes:

One “main” node based on opensearchproject/opensearch:2.11.0 docker image
One “ml” node based on nvidia/cuda:11.6.1-base-centos7 docker image and where I installed opensearch

I installed the nvidia container toolkit and gave gpu resources access to the opensearch ml container.

I use the pretrained huggingface/sentence-transformers/all-MiniLM-L6-v2 model.

I check the gpu usage by running the nvidia-smi on the opensearch ml container while I index my data and no process is listed and the usage stays at 0%.

If you have any idea why it doesn’t work like expected, I’ll be glad to hear it.

Thank you
Pierre

Configuration details:

The docker compose stack

version: "3.4"

services:
    search:
        build:
            context: .
            target: test_opensearch2
        environment:
            - cluster.name=os-docker-cluster     # Search cluster name
            - node.name=opensearch-node-data     # Name the node that will run in this container
            - bootstrap.memory_lock=true         # Disable JVM heap memory swapping
            - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # Set min and max JVM heap sizes to at least 50% of system RAM
            - plugins.ml_commons.allow_registering_model_via_url=true
            - discovery.seed_hosts=search        # Nodes to look for when discovering the cluster
            - cluster.initial_cluster_manager_nodes=opensearch-node-data # Nodes eligible to serve as cluster manager
        volumes:
            - os2_data:/usr/share/opensearch/data:rw
        ulimits:
            memlock:
                soft: -1
                hard: -1
        ports:
            - 9200:9200
    
    search-ml:
        build:
            context: .
            target: test_opensearch2_gpu
        environment:
            - cluster.name=os-docker-cluster     # Search cluster name
            - node.name=opensearch-node-ml # Name the node that will run in this container
            - bootstrap.memory_lock=true         # Disable JVM heap memory swapping
            - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # Set min and max JVM heap sizes to at least 50% of system RAM
            - node.roles=ml
            - plugins.ml_commons.allow_registering_model_via_url=true
            - discovery.seed_hosts=search # Nodes to look for when discovering the cluster
            - cluster.initial_cluster_manager_nodes=opensearch-node-data # Nodes eligible to serve as cluster manager
        ulimits:
            memlock:
                soft: -1
                hard: -1
        volumes:
            - os2_lm_data:/usr/share/opensearch/data:rw

volumes:
    os2_data:
    os2_lm_data:

The Dockerfile

ARG OPENSEARCH2_VERSION=2.11.0
    
FROM opensearchproject/opensearch:${OPENSEARCH2_VERSION} AS test_opensearch2

WORKDIR /usr/share/opensearch

RUN bin/opensearch-plugin install analysis-phonetic
RUN bin/opensearch-plugin install analysis-icu

FROM amazonlinux:2 AS test_opensearch2_gpu_builder

ARG OPENSEARCH2_VERSION
ARG UID=1000
ARG GID=1000
ARG TEMP_DIR=/tmp/opensearch
ARG OPENSEARCH_HOME=/usr/share/opensearch
ARG OPENSEARCH_PATH_CONF=$OPENSEARCH_HOME/config
ARG SECURITY_PLUGIN_DIR=$OPENSEARCH_HOME/plugins/opensearch-security
ARG PERFORMANCE_ANALYZER_PLUGIN_CONFIG_DIR=$OPENSEARCH_PATH_CONF/opensearch-performance-analyzer
ARG OS_VERSION=${OPENSEARCH2_VERSION}
  # Update packages
  # Install the tools we need: tar and gzip to unpack the OpenSearch tarball, and shadow-utils to give us `groupadd` and `useradd`.
  # Install which to allow running of securityadmin.sh
RUN yum update -y && yum install -y tar gzip shadow-utils which && yum clean all
  
  # Create an opensearch user, group, and directory
RUN groupadd -g $GID opensearch && \
adduser -u $UID -g $GID -d $OPENSEARCH_HOME opensearch && \
mkdir $TEMP_DIR

RUN mkdir /usr/share/elasticsearch
WORKDIR /usr/share/elasticsearch

RUN set -eux ; \
cur_arch="" ; \
case "$(arch)" in \
aarch64) cur_arch='arm64' ;; \
x86_64)  cur_arch='x64' ;; \
*) echo >&2 ; echo >&2 "Unsupported architecture $(arch)" ; echo >&2 ; exit 1 ;; \
esac ; \
curl --retry 10 -S -L --output $TEMP_DIR/opensearch.tar.gz https://artifacts.opensearch.org/releases/bundle/opensearch/$OS_VERSION/opensearch-$OS_VERSION-linux-$cur_arch.tar.gz; \
curl --output $TEMP_DIR/opensearch.pgp https://artifacts.opensearch.org/publickeys/opensearch.pgp; \
gpg --import $TEMP_DIR/opensearch.pgp; \
curl --output $TEMP_DIR/opensearch.tar.gz.sig https://artifacts.opensearch.org/releases/bundle/opensearch/$OS_VERSION/opensearch-$OS_VERSION-linux-$cur_arch.tar.gz.sig; \
gpg --verify $TEMP_DIR/opensearch.tar.gz.sig $TEMP_DIR/opensearch.tar.gz;

RUN tar --warning=no-timestamp -zxf $TEMP_DIR/opensearch.tar.gz -C $OPENSEARCH_HOME --strip-components=1 && \
mkdir -p $OPENSEARCH_HOME/data && chown -Rv $UID:$GID $OPENSEARCH_HOME/data && \
if [[ -d $SECURITY_PLUGIN_DIR ]] ; then chmod -v 750 $SECURITY_PLUGIN_DIR/tools/* ; fi && \
rm -rf $TEMP_DIR

COPY docker/opensearch/config/* $OPENSEARCH_PATH_CONF/
COPY docker/opensearch/bin/* $OPENSEARCH_HOME/
RUN if [[ -d $PERFORMANCE_ANALYZER_PLUGIN_CONFIG_DIR ]] ; then mv $OPENSEARCH_PATH_CONF/performance-analyzer.properties $PERFORMANCE_ANALYZER_PLUGIN_CONFIG_DIR/ ; fi

FROM nvidia/cuda:11.6.1-base-centos7 AS test_opensearch2_gpu

ARG UID=1000
ARG GID=1000
ARG OPENSEARCH_HOME=/usr/share/opensearch
ARG OS_VERSION=2.5.0

RUN yum update -y && yum install -y tar gzip shadow-utils which && yum clean all
  
  # Create an opensearch user, group
RUN groupadd -g $GID opensearch && \
adduser -u $UID -g $GID -d $OPENSEARCH_HOME opensearch
  
  # Copy from Stage0
COPY --from=test_opensearch2_gpu_builder --chown=$UID:$GID $OPENSEARCH_HOME $OPENSEARCH_HOME
WORKDIR $OPENSEARCH_HOME
  
  # Set $JAVA_HOME
RUN echo "export JAVA_HOME=$OPENSEARCH_HOME/jdk" >> /etc/profile.d/java_home.sh && \
echo "export PATH=\$PATH:\$JAVA_HOME/bin" >> /etc/profile.d/java_home.sh
ENV JAVA_HOME=$OPENSEARCH_HOME/jdk
ENV PATH=$PATH:$JAVA_HOME/bin:$OPENSEARCH_HOME/bin
  
  # Add k-NN lib directory to library loading path variable
ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$OPENSEARCH_HOME/plugins/opensearch-knn/lib"
  
  # Change user
USER $UID
  
  # Setup OpenSearch
  # Disable security demo installation during image build, and allow user to disable during startup of the container
  # Enable security plugin during image build, and allow user to disable during startup of the container
ARG DISABLE_INSTALL_DEMO_CONFIG=true
ARG DISABLE_SECURITY_PLUGIN=false
RUN ./opensearch-onetime-setup.sh

RUN bin/opensearch-plugin install analysis-phonetic
RUN bin/opensearch-plugin install analysis-icu

EXPOSE 9200 9300 9600 9650
  
# Label
LABEL org.label-schema.schema-version="1.0" \
org.label-schema.name="opensearch" \
org.label-schema.version="$OS_VERSION" \
org.label-schema.url="https://opensearch.org" \
org.label-schema.vcs-url="https://github.com/OpenSearch" \
org.label-schema.license="Apache-2.0" \
org.label-schema.vendor="OpenSearch"
  
# CMD to run
ENTRYPOINT ["./opensearch-docker-entrypoint.sh"]
CMD ["opensearch"]

The deployed model

{ 
  "name": "huggingface/sentence-transformers/all-MiniLM-L6-v2",
  "model_group_id": "URoCjY0BnzFjNeeq8dLX",
  "algorithm": "TEXT_EMBEDDING",
  "model_version": "1",
  "model_format": "TORCH_SCRIPT",
  "model_state": "DEPLOYED",
  "model_content_size_in_bytes": 91790008,
  "model_content_hash_value": "c15f0d2e62d872be5b5bc6c84d2e0f4921541e29fefbef51d59cc10a8ae30e0f",
  "model_config": { - 
    "model_type": "bert",
    "embedding_dimension": 384,
    "framework_type": "SENTENCE_TRANSFORMERS",
    "all_config": "{\"_name_or_path\":\"nreimers/MiniLM-L6-H384-uncased\",\"architectures\":[\"BertModel\"],\"attention_probs_dropout_prob\":0.1,\"gradient_checkpointing\":false,\"hidden_act\":\"gelu\",\"hidden_dropout_prob\":0.1,\"hidden_size\":384,\"initializer_range\":0.02,\"intermediate_size\":1536,\"layer_norm_eps\":1e-12,\"max_position_embeddings\":512,\"model_type\":\"bert\",\"num_attention_heads\":12,\"num_hidden_layers\":6,\"pad_token_id\":0,\"position_embedding_type\":\"absolute\",\"transformers_version\":\"4.8.2\",\"type_vocab_size\":2,\"use_cache\":true,\"vocab_size\":30522}"
  },
  "created_time": 1707469090413,
  "last_updated_time": 1707471239264,
  "last_registered_time": 1707469096582,
  "last_deployed_time": 1707471239264,
  "total_chunks": 10,
  "planning_worker_node_count": 1,
  "current_worker_node_count": 1,
  "planning_worker_nodes": [ - 
    "1Gv9j7-8RQKDbMNcgTc35g"
  ],
  "deploy_to_all_nodes": true
}

Ingest pipeline definition

{  
  "test-ingest-pipeline": { - 
    "description": "test-ingest-pipeline",
    "processors": [ 
      { - 
        "set": { - 
          "field": "text_embedding",
          "value": [  
            ""
          ]
        }
      },
      {  
        "append": { 
          "field": "text_embedding",
          "if": "(ctx['name'] instanceof String)",
          "value": "The name is {{{name}}}"
        }
      },
      { 
        "text_embedding": {  
          "model_id": "cE0WjY0B9ALueO2pruv7",
          "field_map": {  
            "text_embedding": "embedding"
          }
        }
      }
    ]
  }
}

dhrubo · February 20, 2024, 9:49pm

@Pigau,

could be you please take a look at this article: GPU acceleration - OpenSearch Documentation

Please let me know if that works.

Thanks
Dhrubo

Pigau · February 21, 2024, 10:18am

Hello @dhrubo ,

Thank you for your response,
Yes I follow this documentation and I prepare an NVIDIA ML node as it was mention in it.

But the PyTorch part is not very clear for me ? Do I need to install PyTorch in the opensearch image in order to allow opensearch to use the GPU ?
Because I only install opensearch in the image base on nvdia/cuda and pytorch seems embeded in the the opensearch package as my MiniLM model works for calculation vector without installing anything else.

system · April 21, 2024, 10:18am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ingest Pipeline Nested Array Vector Embeddings (AWS Comprehend Medical) Machine Learning troubleshoot , configure , index-management	2	53	December 22, 2024
Concept : Apply GPU acceleration to ML Node in k8s operator Machine Learning discuss , troubleshoot , configure , install , feature-request	1	23	January 14, 2025
OpenSearch Ingest Pipeline - Performance Understanding Performance Analyzer	1	1032	January 10, 2022
Ingestion pipeline for a nested field OpenSearch troubleshoot	3	748	September 19, 2024
Performance and scaling of ML models and dense vector data Machine Learning discuss	6	677	May 12, 2023

Use GPU acceleration for pipeline with text_embedding

Related topics