AWS OpenSearch service lost one node during indexation of knn vectors

Garance · October 5, 2023, 4:42pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 1.3 on AWS

Describe the issue:
I have a OpenSearch Service instance on AWS with 2 nodes, during each indexation with knn vectors, one node was lost near the end of indexation. Thanks to the automatic remediation of red clusters, the lost node was restored 30 minutes later.

The vector index contains some text fields and a knn field, with more than 2 millions documents, the total size is 11gb in index.
I have a Python3 program to get data from an index with scroll, and run vector indexation with _bulk

How to avoid the problem of lost node?

Configuration:
OpenSearch Service instance: type c6g.xlarge.search on AWS, 2 nodes with 4 vCPU, 8 Gb RAM, 200 Gb storage, 6000 IOPS, 256 Mo/s of each node

Schema:

{
  "settings": {
    "index": {
      "number_of_shards": 2,
      "number_of_replicas": 0,
      "knn": true,
      "knn.algo_param.ef_search": 512
    }
  },
  "mappings": {
    "dynamic": "false",
    "properties": {
      "title": {
        "type": "text"
      },
      "text": {
        "type": "text"
      }
      "vector": {
        "type": "knn_vector",
        "dimension": 256,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "nmslib",
          "parameters": {
            "ef_construction": 2000,
            "m": 16
          }
        }
      }
    }
  }
}

Relevant Logs or Screenshots:
IndexingRate

radu.gheorghe · October 10, 2023, 9:52am

Why did the node go down? Do you see anything in the logs?

Garance · October 10, 2023, 12:07pm

I didn’t know the reason. Here are some logs in CloudWatch:

[2023-10-02T21:19:33,271][WARN ][o.o.c.NodeConnectionsService] [19a43ee7408bca1891925f5ed1897d14] failed to connect to {e7736a956ab97de531be4c788aa799ed}{XLPkuGNkSPulv8HsUpJChw}{0DFrYfW3Q8C4mAlZ_gwv5Q}{IP}{IP}{dimr}{dp_version=20210501, distributed_snapshot_deletion_enabled=false, cold_enabled=false, adv_sec_enabled=false, AMAZON_INTERNAL, cross_cluster_transport_address=IP, awareness_features_enabled=true, global_cpu_usage_ac_supported=true, shard_indexing_pressure_enabled=true, AMAZON_INTERNAL, search_backpressure_feature_present=true} (tried [25] times)
ConnectTransportException[[xxx][IP] handshake failed. unexpected remote node {xxx}{xxx}{xxx}{IP}{IP}{dimr}{dp_version=20210501, distributed_snapshot_deletion_enabled=false, cold_enabled=false, adv_sec_enabled=false, AMAZON_INTERNAL, cross_cluster_transport_address=IP, awareness_features_enabled=true, global_cpu_usage_ac_supported=true, shard_indexing_pressure_enabled=true, AMAZON_INTERNAL, search_backpressure_feature_present=true}]

[2023-10-02T21:20:17,508][WARN ][o.o.i.c.IndicesClusterStateService] [xxx] [my_knn_index][0] marking and sending shard failed due to [shard failure, reason [merge failed]]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.RuntimeException: java.lang.RuntimeException: [KNN] Adding footer to serialized graph failed: org.apache.lucene.index.MergePolicy$MergeAbortedException: Merge aborted.
…

During the indexation, CPUUtilization was 60%, JVMMemoryPressure was between 80% and 100%, FreeStorageSpace was 220 Gib.

vamshin · October 11, 2023, 5:38am

@Garance Is it possible to repro in latest versions(>=2.5 version)? We can certainly take look but want to make sure if this is something we fixed in 2.x.

Garance · October 11, 2023, 12:10pm

@vamshin Thank you for the response, I will test the latest version.

Topic		Replies	Views
AWS OpenSearch indexes disappeared after days of no usage OpenSearch troubleshoot	2	540	May 2, 2025
Migrating and using kNN indexes from OpenDistro to OpenSearch OpenSearch	8	652	March 27, 2023
Nodes crashes, leader check Open Source Elasticsearch and Kibana	3	1018	August 19, 2022
Why my OpenSearch vector search is slow OpenSearch	5	1327	January 18, 2024
Logstash loses connection to OpenSearch periodically OpenSearch	5	1719	November 19, 2024

AWS OpenSearch service lost one node during indexation of knn vectors

Related topics