Data Nodes Unable to Reach S3 bucket

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.12

Describe the issue:
I am trying to initialize an S3 bucket on a new 1 master 2 data node cluster. This bucket is defined from my old cluster (1 master node) where I am trying to restore a snapshot from. Are there any issues with restoring a snapshot from a single node cluster to a multi-node cluster? Certs have all been verified for the data nodes and they have the same firewall access as the master node (which has no problems connecting to the s3 bucket).

PUT /_snapshot/devsnapshot/
{
“type” : “s3”,
“settings” : {
“bucket” : “opensearch-snapshot”,
“path_style_access” : “true”,
“endpoint” : “xxxx.com
}
}

I am getting the below error
{
“error”: {
“root_cause”: [
{
“type”: “repository_verification_exception”,
“reason”: "[devsnapshot] [[6zR4OaX5Qx-AeWzGIpjguQ, 'RemoteTransportException[[xxxxx.com][x.x.x.x:xxxx][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[devsnapshot] store location [opensearch-snapshots] is not accessible on the node

Configuration:
I have already checked that the repository-s3 plugin is installed and the same version. I have even uninstalled and reinstalled it.

Opensearch.yml has the following s3 configuration
s3.client.default.endpoint: xxxx.com:xxx
s3.client.default.path_style_access: true
s3.client.default.region: us-wests

opensearch-keystore keys have been correctly configured. Removed and added them again just to make sure there were no mistakes.

s3.client.default.access_key
s3.client.default.secret_key

Relevant Logs or Screenshots:

I have also found in the logs for both my data nodes are errors

[2024-05-21T12:48:01,286][WARN ][o.o.r.VerifyNodeRepositoryAction] [xxxx.com] [devsnapshot] failed to verify repository
org.opensearch.repositories.RepositoryVerificationException: [devsnapshot] store location [opensearch-snapshots] is not accessible on the node [{xxxx.com}{6zR4OaX5Qx-AeWzGIpjguQ}{s68GrDgqT5Kql_lbwrcd9Q}{x.x.x.x}{x.x.x.x:xxxx}{di}{shard_indexing_pressure_enabled=true}]

Caused by: java.io.IOException: Unable to upload object [tests-8dC-kZouR6KosIIRs4bxqA/data-6zR4OaX5Qx-AeWzGIpjguQ.dat] using a single upload
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Connect to xxxx.com:xxx [XXXX] failed: Connect timed out

Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 1 failure: Unable to execute HTTP request: Connect to xxxx.com:xxx [XXXX] failed: Connect timed out
Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 2 failure: Unable to execute HTTP request: Connect to xxxx.com:xxx [XXXX] failed: Connect timed out
Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 3 failure: Unable to execute HTTP request: Connect to xxxx.com:xxx [XXXX] failed: Connect timed out
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to xxxx.com:xxx [XXXX] failed: Connect timed out

I tried the following steps as I have read online.

  • Un-install repository s3 plugin on all nodes.
  • Remove access and secret keys from opensearch-keystore.
  • Reboot
  • Re-install repository s3 plugin (I am doing ./opensearch-plugin install repository-s3-2.12.0.zip because I had to download the zip manually due to timeout issues. I notice when downloading from URL, it is painfully long before it makes a connection and actually downloads the file. Not sure if local issue or not.)
  • Re-add access and secret key to opensearch-keystore
  • Restart all nodes
  • Run command to create repository

Still getting the same error. The Master node configurations for the s3 bucket are exactly the same as the data nodes. Yet my master node is the only one not getting the error, just data1 and data2 nodes. Anyone have any advice on this? All have the same firewall rules as well.

"type": "repository_verification_exception",
"reason": "[snapshot] [[D_qkrHTGR_WjRO8nNmR1VA, 'RemoteTransportException[[data2.com][x.x.x.x:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[snapshot] store location [opensearch-snapshots] is not accessible on the node [{data2.com}{D_qkrHTGR_WjRO8nNmR1VA}{urkCsePmTqO88_ETbv9Bcg}{x.x.x.x}{x.x.x.x:9300}{di}{shard_indexing_pressure_enabled=true}]]; nested: IOException[Unable to upload object [tests-Wvvo_P-DSYyunr3PGvzURQ/data-D_qkrHTGR_WjRO8nNmR1VA.dat] using a single upload]; nested: NotSerializableExceptionWrapper[sdk_client_exception: Unable to execute HTTP request: Connect to xxxx.com:xxx [xxxx.com/x.x.x.x] failed: Connect timed out]; nested: IOException[Connect to xxxx.com:xxx [xxxx.com/x.x.x.x] failed: Connect timed out]; nested: IOException[Connect timed out];'], [6zR4OaX5Qx-AeWzGIpjguQ, 'RemoteTransportException[[data1.com][x.x.x.x:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[snapshot] store location [opensearch-snapshots] is not accessible on the node [{data1.com}{6zR4OaX5Qx-AeWzGIpjguQ}{rux4b8ajRUukWeqCuhniDQ}{x.x.x.x}{x.x.x.x:9300}{di}{shard_indexing_pressure_enabled=true}]]; nested: IOException[Unable to upload object [tests-Wvvo_P-DSYyunr3PGvzURQ/data-6zR4OaX5Qx-AeWzGIpjguQ.dat] using a single upload]; nested: NotSerializableExceptionWrapper[sdk_client_exception: Unable to execute HTTP request: Connect to xxxx.com:xxx [xxxx.com/x.x.x.x] failed: Connect timed out]; nested: IOException[Connect to xxxx.com:xxx [xxxx.com/x.x.x.x] failed: Connect timed out]; nested: IOException[Connect timed out];']]"

@jsabatel, I am also facing the same issue, were you able to solve this issue ??

@bshashan I was recently able to resolve it. After days of troubleshooting I was able to verify that my firewall team did not actually open up the required ports to the s3 bucket as they said they did. After this was fixed, my data nodes were able to connect.

  1. Make sure certs are accurate.
  2. Make sure you don’t have any firewalls blocking you!

Ensure firewall permits outbound connections to S3 endpoint, verify endpoint URL correctness (including protocol and port), and validate IAM credentials and permissions for S3 access. Additionally, review OpenSearch cluster settings to confirm repository configuration consistency across nodes.