ES crashes when indexing KNN Vectors

A fatal error has been detected by the Java Runtime Environment: elasticsearch | # elasticsearch | # SIGILL (0x4) at pc=0x00007fb5c03e7f89, pid=1, tid=281 elasticsearch | # elasticsearch | # JRE version: OpenJDK Runtime Environment AdoptOpenJDK (14.0.1+7) (build 14.0.1+7) elasticsearch | # Java VM: OpenJDK 64-Bit Server VM AdoptOpenJDK (14.0.1+7, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64) elasticsearch | # Problematic frame: elasticsearch | # C [libKNNIndexV1_7_3_6.so+0x1a1f89] float std::generate_canonical<float, 24ul, std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul> >(std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>&)+0x1c9

When indexing vectors (768 dimensions) into a knn vector field ES crashes over and over.

This happens on both version 1.9 & 1.8 using docker. After it crashes on the first index, I have to actually remove the docker volume for the ES container to start without throwing this error.

column mapping setting

"vector" => [ "type" => "knn_vector", "dimension" => 768, ],

index settings

'knn' => true, 'knn.space_type' => 'cosinesimil'

Hi @adrianpaiva,

This is interesting. From the stacktrace, it looks to have to do with the libKNNIndexV1_7_3_6 library. I have run some indexing tests for 1.9 Docker image with cosine similarity with dim of 768 and nothing is breaking.

I have a couple questions to try to understand the environment and workload you are using so I can try to reproduce the crash:

  1. Just to confirm, you are using this docker image for 1.9.0: amazon/opendistro-for-elasticsearch:1.9.0
  2. Do you follow this for installation: Docker - Open Distro Documentation
  3. What is the architecture and operating system of the machine you are using?
  4. How much RAM does the machine have?
  5. How much storage does your machine have?
  6. Are you changing any of the cluster settings?
  7. How many nodes does the cluster have?

Jack

Hi @adrianpaiva,

Are you still facing the issue?