The coordinator exits the cluster

KolesnikovYA · August 2, 2023, 12:49pm

Hello!

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
I’m using OpenSearch v.2.8.0

Describe the issue:
I create stress-test for OpenSearch cluster - make 10 simultaneous requests in which I ask to return 10 of the most similar from the kNN-index for 10 different documents. The index size is approximately 18 million documents. After these requests coordinator-node exists the cluster. The CPU utilization in 75%, when coordinator exists from cluster.
At the same time, similar requests to an identical OpenDistro cluster return results, and everything goes well.
Please give me some advice how can I fix the problem?

Configuration:
Here is my configuration for OpenSearch cluster (this confoguration is identical in OpenDistro):

{
  "persistent": {
    "action": {
      "destructive_requires_name": "true"
    },
    "cluster": {
      "routing": {
        "rebalance": {
          "enable": "none"
        },
        "allocation": {
          "allow_rebalance": "indices_all_active",
          "cluster_concurrent_rebalance": "15",
          "node_concurrent_recoveries": "2",
          "disk": {
            "threshold_enabled": "true",
            "watermark": {
              "low": "85%",
              "high": "90%"
            }
          },
          "enable": "all",
          "node_concurrent_outgoing_recoveries": "2"
        }
      },
      "metadata": {
        "perf_analyzer": {
          "state": "0"
        }
      }
    },
    "knn": {
      "algo_param": {
        "index_thread_qty": "14"
      },
      "cache": {
        "item": {
          "expiry": {
            "enabled": "false",
            "minutes": "1m"
          }
        }
      },
      "circuit_breaker": {
        "triggered": "false"
      },
      "memory": {
        "circuit_breaker": {
          "limit": "80%",
          "enabled": "true"
        }
      }
    },
    "plugins": {
      "index_state_management": {
        "template_migration": {
          "control": "-1"
        }
      }
    }
  },
  "transient": {}
}

And here is my configuration for OpenSearch cluster:

master_node - 3 VM, 18 CPU, 370 GBHDD, 270 GB RAM
coordinator_node - 3 VM, 14 CPU, 45 GB HDD, 254 GB RAM
data_node - 120 VM, 18 CPU, 368 HDD, 270 GB RAM

Relevant Logs or Screenshots:

radu.gheorghe · August 7, 2023, 1:59pm

It would be nice to know why it exited. Anything interesting in its logs? Or just the manager says it times out?

Topic		Replies	Views
Data skew on opensearch cluster OpenSearch	1	168	July 16, 2024
Node frequently leaving and joining the Cluster OpenSearch troubleshoot	0	713	November 3, 2022
Low CPU utilization OpenSearch	3	802	November 3, 2023
OpenDistro cluster becomes unstable after losing a node OpenDistro	8	960	January 11, 2022
Search request rate imbalance OpenSearch troubleshoot	2	246	July 10, 2024

The coordinator exits the cluster

Related topics