Use GPU acceleration for pipeline with text_embedding

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

  • OpenSearch 2.11.0
  • EC2 g4dn.xlarge with AMI “AWS Deep Learning Base AMI GPU CUDA 11 (Ubuntu 20.04) 20230403” (ami-0fa79b4479d6a7310)
  • Docker 23.0.2
  • Nvidia driver : NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0

Describe the issue:

Hello everyone,

As the title mentions, I’m trying to use GPU acceleration for ingest pipeline that uses text_embedding. The indexation works fine, the embedding vectors are calculated correctly but the GPU is not doing any things during the whole process.

Configuration:

My application runs in a docker compose stack on an EC2 instance.
So with the help of this documentation i configured two nodes:

  • One “main” node based on opensearchproject/opensearch:2.11.0 docker image
  • One “ml” node based on nvidia/cuda:11.6.1-base-centos7 docker image and where I installed opensearch

I installed the nvidia container toolkit and gave gpu resources access to the opensearch ml container.

I use the pretrained huggingface/sentence-transformers/all-MiniLM-L6-v2 model.

I check the gpu usage by running the nvidia-smi on the opensearch ml container while I index my data and no process is listed and the usage stays at 0%.

If you have any idea why it doesn’t work like expected, I’ll be glad to hear it.

Thank you
Pierre

Configuration details:

The docker compose stack
version: "3.4"

services:
    search:
        build:
            context: .
            target: test_opensearch2
        environment:
            - cluster.name=os-docker-cluster     # Search cluster name
            - node.name=opensearch-node-data     # Name the node that will run in this container
            - bootstrap.memory_lock=true         # Disable JVM heap memory swapping
            - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # Set min and max JVM heap sizes to at least 50% of system RAM
            - plugins.ml_commons.allow_registering_model_via_url=true
            - discovery.seed_hosts=search        # Nodes to look for when discovering the cluster
            - cluster.initial_cluster_manager_nodes=opensearch-node-data # Nodes eligible to serve as cluster manager
        volumes:
            - os2_data:/usr/share/opensearch/data:rw
        ulimits:
            memlock:
                soft: -1
                hard: -1
        ports:
            - 9200:9200
    
    search-ml:
        build:
            context: .
            target: test_opensearch2_gpu
        environment:
            - cluster.name=os-docker-cluster     # Search cluster name
            - node.name=opensearch-node-ml # Name the node that will run in this container
            - bootstrap.memory_lock=true         # Disable JVM heap memory swapping
            - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # Set min and max JVM heap sizes to at least 50% of system RAM
            - node.roles=ml
            - plugins.ml_commons.allow_registering_model_via_url=true
            - discovery.seed_hosts=search # Nodes to look for when discovering the cluster
            - cluster.initial_cluster_manager_nodes=opensearch-node-data # Nodes eligible to serve as cluster manager
        ulimits:
            memlock:
                soft: -1
                hard: -1
        volumes:
            - os2_lm_data:/usr/share/opensearch/data:rw

volumes:
    os2_data:
    os2_lm_data:
The Dockerfile
ARG OPENSEARCH2_VERSION=2.11.0
    
FROM opensearchproject/opensearch:${OPENSEARCH2_VERSION} AS test_opensearch2

WORKDIR /usr/share/opensearch

RUN bin/opensearch-plugin install analysis-phonetic
RUN bin/opensearch-plugin install analysis-icu

FROM amazonlinux:2 AS test_opensearch2_gpu_builder

ARG OPENSEARCH2_VERSION
ARG UID=1000
ARG GID=1000
ARG TEMP_DIR=/tmp/opensearch
ARG OPENSEARCH_HOME=/usr/share/opensearch
ARG OPENSEARCH_PATH_CONF=$OPENSEARCH_HOME/config
ARG SECURITY_PLUGIN_DIR=$OPENSEARCH_HOME/plugins/opensearch-security
ARG PERFORMANCE_ANALYZER_PLUGIN_CONFIG_DIR=$OPENSEARCH_PATH_CONF/opensearch-performance-analyzer
ARG OS_VERSION=${OPENSEARCH2_VERSION}
  # Update packages
  # Install the tools we need: tar and gzip to unpack the OpenSearch tarball, and shadow-utils to give us `groupadd` and `useradd`.
  # Install which to allow running of securityadmin.sh
RUN yum update -y && yum install -y tar gzip shadow-utils which && yum clean all
  
  # Create an opensearch user, group, and directory
RUN groupadd -g $GID opensearch && \
adduser -u $UID -g $GID -d $OPENSEARCH_HOME opensearch && \
mkdir $TEMP_DIR

RUN mkdir /usr/share/elasticsearch
WORKDIR /usr/share/elasticsearch

RUN set -eux ; \
cur_arch="" ; \
case "$(arch)" in \
aarch64) cur_arch='arm64' ;; \
x86_64)  cur_arch='x64' ;; \
*) echo >&2 ; echo >&2 "Unsupported architecture $(arch)" ; echo >&2 ; exit 1 ;; \
esac ; \
curl --retry 10 -S -L --output $TEMP_DIR/opensearch.tar.gz https://artifacts.opensearch.org/releases/bundle/opensearch/$OS_VERSION/opensearch-$OS_VERSION-linux-$cur_arch.tar.gz; \
curl --output $TEMP_DIR/opensearch.pgp https://artifacts.opensearch.org/publickeys/opensearch.pgp; \
gpg --import $TEMP_DIR/opensearch.pgp; \
curl --output $TEMP_DIR/opensearch.tar.gz.sig https://artifacts.opensearch.org/releases/bundle/opensearch/$OS_VERSION/opensearch-$OS_VERSION-linux-$cur_arch.tar.gz.sig; \
gpg --verify $TEMP_DIR/opensearch.tar.gz.sig $TEMP_DIR/opensearch.tar.gz;

RUN tar --warning=no-timestamp -zxf $TEMP_DIR/opensearch.tar.gz -C $OPENSEARCH_HOME --strip-components=1 && \
mkdir -p $OPENSEARCH_HOME/data && chown -Rv $UID:$GID $OPENSEARCH_HOME/data && \
if [[ -d $SECURITY_PLUGIN_DIR ]] ; then chmod -v 750 $SECURITY_PLUGIN_DIR/tools/* ; fi && \
rm -rf $TEMP_DIR

COPY docker/opensearch/config/* $OPENSEARCH_PATH_CONF/
COPY docker/opensearch/bin/* $OPENSEARCH_HOME/
RUN if [[ -d $PERFORMANCE_ANALYZER_PLUGIN_CONFIG_DIR ]] ; then mv $OPENSEARCH_PATH_CONF/performance-analyzer.properties $PERFORMANCE_ANALYZER_PLUGIN_CONFIG_DIR/ ; fi

FROM nvidia/cuda:11.6.1-base-centos7 AS test_opensearch2_gpu

ARG UID=1000
ARG GID=1000
ARG OPENSEARCH_HOME=/usr/share/opensearch
ARG OS_VERSION=2.5.0

RUN yum update -y && yum install -y tar gzip shadow-utils which && yum clean all
  
  # Create an opensearch user, group
RUN groupadd -g $GID opensearch && \
adduser -u $UID -g $GID -d $OPENSEARCH_HOME opensearch
  
  # Copy from Stage0
COPY --from=test_opensearch2_gpu_builder --chown=$UID:$GID $OPENSEARCH_HOME $OPENSEARCH_HOME
WORKDIR $OPENSEARCH_HOME
  
  # Set $JAVA_HOME
RUN echo "export JAVA_HOME=$OPENSEARCH_HOME/jdk" >> /etc/profile.d/java_home.sh && \
echo "export PATH=\$PATH:\$JAVA_HOME/bin" >> /etc/profile.d/java_home.sh
ENV JAVA_HOME=$OPENSEARCH_HOME/jdk
ENV PATH=$PATH:$JAVA_HOME/bin:$OPENSEARCH_HOME/bin
  
  # Add k-NN lib directory to library loading path variable
ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$OPENSEARCH_HOME/plugins/opensearch-knn/lib"
  
  # Change user
USER $UID
  
  # Setup OpenSearch
  # Disable security demo installation during image build, and allow user to disable during startup of the container
  # Enable security plugin during image build, and allow user to disable during startup of the container
ARG DISABLE_INSTALL_DEMO_CONFIG=true
ARG DISABLE_SECURITY_PLUGIN=false
RUN ./opensearch-onetime-setup.sh

RUN bin/opensearch-plugin install analysis-phonetic
RUN bin/opensearch-plugin install analysis-icu

EXPOSE 9200 9300 9600 9650
  
# Label
LABEL org.label-schema.schema-version="1.0" \
org.label-schema.name="opensearch" \
org.label-schema.version="$OS_VERSION" \
org.label-schema.url="https://opensearch.org" \
org.label-schema.vcs-url="https://github.com/OpenSearch" \
org.label-schema.license="Apache-2.0" \
org.label-schema.vendor="OpenSearch"
  
# CMD to run
ENTRYPOINT ["./opensearch-docker-entrypoint.sh"]
CMD ["opensearch"]
The deployed model
{ 
  "name": "huggingface/sentence-transformers/all-MiniLM-L6-v2",
  "model_group_id": "URoCjY0BnzFjNeeq8dLX",
  "algorithm": "TEXT_EMBEDDING",
  "model_version": "1",
  "model_format": "TORCH_SCRIPT",
  "model_state": "DEPLOYED",
  "model_content_size_in_bytes": 91790008,
  "model_content_hash_value": "c15f0d2e62d872be5b5bc6c84d2e0f4921541e29fefbef51d59cc10a8ae30e0f",
  "model_config": { - 
    "model_type": "bert",
    "embedding_dimension": 384,
    "framework_type": "SENTENCE_TRANSFORMERS",
    "all_config": "{\"_name_or_path\":\"nreimers/MiniLM-L6-H384-uncased\",\"architectures\":[\"BertModel\"],\"attention_probs_dropout_prob\":0.1,\"gradient_checkpointing\":false,\"hidden_act\":\"gelu\",\"hidden_dropout_prob\":0.1,\"hidden_size\":384,\"initializer_range\":0.02,\"intermediate_size\":1536,\"layer_norm_eps\":1e-12,\"max_position_embeddings\":512,\"model_type\":\"bert\",\"num_attention_heads\":12,\"num_hidden_layers\":6,\"pad_token_id\":0,\"position_embedding_type\":\"absolute\",\"transformers_version\":\"4.8.2\",\"type_vocab_size\":2,\"use_cache\":true,\"vocab_size\":30522}"
  },
  "created_time": 1707469090413,
  "last_updated_time": 1707471239264,
  "last_registered_time": 1707469096582,
  "last_deployed_time": 1707471239264,
  "total_chunks": 10,
  "planning_worker_node_count": 1,
  "current_worker_node_count": 1,
  "planning_worker_nodes": [ - 
    "1Gv9j7-8RQKDbMNcgTc35g"
  ],
  "deploy_to_all_nodes": true
}
Ingest pipeline definition
{  
  "test-ingest-pipeline": { - 
    "description": "test-ingest-pipeline",
    "processors": [ 
      { - 
        "set": { - 
          "field": "text_embedding",
          "value": [  
            ""
          ]
        }
      },
      {  
        "append": { 
          "field": "text_embedding",
          "if": "(ctx['name'] instanceof String)",
          "value": "The name is {{{name}}}"
        }
      },
      { 
        "text_embedding": {  
          "model_id": "cE0WjY0B9ALueO2pruv7",
          "field_map": {  
            "text_embedding": "embedding"
          }
        }
      }
    ]
  }
}

@Pigau,

could be you please take a look at this article: GPU acceleration - OpenSearch Documentation

Please let me know if that works.

Thanks
Dhrubo

Hello @dhrubo ,

Thank you for your response,
Yes I follow this documentation and I prepare an NVIDIA ML node as it was mention in it.

But the PyTorch part is not very clear for me ? Do I need to install PyTorch in the opensearch image in order to allow opensearch to use the GPU ?
Because I only install opensearch in the image base on nvdia/cuda and pytorch seems embeded in the the opensearch package as my MiniLM model works for calculation vector without installing anything else.