Migrating a data path to a new data node

spapadop · September 18, 2023, 1:40pm

Within a server, I have a node with data role, let’s say data-node1 with below setting in opensearch.yml:

path.data: "/var/lib/opensearch/data"

Now, I want to stop data-node1, bootstrap another node (e.g. data-node2) and add the same configuration to it, so that it “picks up the work” from where data-node1 has left it:

path.data: "/var/lib/opensearch/data"

That isn’t possible as it creates a conflict since the path has already been used by another data node. Is there any way to achieve this, other than setting a new data path for data-node2 and drain the old node towards that new data path?

I could have used node.max_local_storage_nodes but firstly it’s deprecated and secondly this isn’t really what I want to achieve.

Many thanks in advance for your input!

Gsmitt · September 19, 2023, 4:01am

Hey @spapadop

I sorta get what you saying, what I dont understand is each node have there own data-path. That is unless were talking about the same node.

For example, if I install Elasticsearch and wanted to stop it and install Opensearch then something like this.

ES/OS data path config should be the same.

path.data: "/var/lib/elsticsearch/data

curl -X PUT "http://hostame-of-an-OpenSearch-node:9200/_cluster/settings" -H 'Content-Type: application/json' -d'{

  "transient" : {

     "cluster.routing.allocation.enable" : "primaries"

  }

}
'

sudo rsync -avP /var/lib/elasticsearch/* /var/lib/opensearch/
sudo chown -R opensearch:opensearch /var/lib/opensearch

Is that what your referring? or is this Data-Path on a separt/different shared volume?

If not, then each Opensearch data path is there own. the way they share information is by clustering. you would have a master/leader nodes and data-nodes.

spapadop · September 20, 2023, 10:02am

Hi @Gsmitt,

Thanks for the response. I had a typo on my above post, fixed now. Let me try to elaborate.

Here are the contents of /var/lib/opensearch/data:

-rw-r--r--. 1 opensearch opensearch  5 Jun 13 09:15 batch_metrics_enabled.conf
-rw-r--r--. 1 opensearch opensearch  5 Jun 13 09:15 logging_enabled.conf
drwxr-xr-x. 3 opensearch opensearch 15 Jun 13 09:15 nodes
-rw-r--r--. 1 opensearch opensearch  5 Jun 13 09:15 performance_analyzer_enabled.conf
-rw-r--r--. 1 opensearch opensearch  5 Jun 13 09:15 rca_enabled.conf
-rw-r--r--. 1 opensearch opensearch  5 Jun 13 09:15 thread_contention_monitoring_enabled.conf

A happy data-node1 is writing data to it, business as usual.

Now, I want to stop data-node1 and retire it. Then, I bootstrap a new node: data-node2, who should continue from where data-node1 is left, i.e. read all the data on /var/lib/opensearch/data, “declare” to the cluster that all respective shards now belong to data-node2 and continue business as usual.

The actual motivation behind is that these /var/lib/opensearch/data data may live in a cephfs or s3 cluster and just be mounted on a host. If the above could work, it will enable us to change backend hosts transparently, i.e. kill a rhel8 host that had mounted these data and bootstrap a rhel9 host, mount the same path and continue working happily.

What worries me is that metadata regarding the old data-node1 live within that data and my understanding is that those metadata cannot (at the moment) be simply reset by a new data-node2 connecting to the data path. And probably for a good reason. But it would be nice to explore such possibilities to support use-cases like the one I mention above.

Let me know if something remains unclear and again thanks a lot for the discussion.

Gsmitt · September 21, 2023, 12:49am

Hey @spapadop

Ok i understand now, to use the same index/data as Node-1 for node-2 it would be best to create a snapshot. Then upload it into node-2 and start services.

Second option
As for bootstrap only thing I can think of is Opensearch node-1 had two volumes like this.

Disk#1

/dev/sda1 ( i.e., for the operationing system)

Disk #2

/dev/sdb1 (i.e., path.data: /mnt/opensearch_data)

I use the snapshot method and works great.

spapadop · September 26, 2023, 2:43pm

Thanks @Gsmitt, I was suspecting the same. I’ll go ahead accordingly.

Topic		Replies	Views
Multiple path.data directories/disks OpenSearch configure	7	2052	October 5, 2023
Data path in opensearch.yml OpenSearch troubleshoot , configure	0	854	May 27, 2022
Migrating from ElasticSearch 7.10+ to OpenSearch 1.x General Feedback	8	2218	April 5, 2023
There is any problem when data node -> master node? OpenSearch	2	303	March 7, 2024
Migrate from master-data to master-only nodes Kubernetes	1	1005	September 30, 2022

Migrating a data path to a new data node

Related topics