Slowness in adding/removing from opensearch-keystore

Versions (OpenSearch/Server OS/):
OpenSearch 3.3.2, Red Hat 8.10

Describe the issue:

I’m seeing that attempts to add to or remove the keystore using opensearch-keystore on the CLI are hanging and not completing;

[root@vdc-vm-0026925 opensearch]# bin/opensearch-keystore list gcs.client.default.credentials_file
keystore.seed

[root@vdc-vm-0026925 opensearch]# bin/opensearch-keystore remove gcs.client.default.credentials_file

When I run this, I’m seeing that there’s a opensearch.keystore.tmp file created in config/

[root@vdc-vm-0026925 opensearch]# ls -la config/*keystore*
-rw-------. 1 opensearch opensearch 2570 Nov  7 14:41 config/opensearch.keystore
-rw-r-----. 1 root       root          0 Nov  7 16:15 config/opensearch.keystore.tmp

If i try to run this again whilst there is config/opensearch.keystore.tmp file, I’m thrown this error

[root@vdc-vm-0026925 opensearch]# bin/opensearch-keystore remove gcs.client.default.credentials_file
Exception in thread "main" java.nio.file.FileAlreadyExistsException: /usr/share/opensearch/config/opensearch.keystore.tmp
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:213)
        at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:244)
        at java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:426)
        at java.base/java.nio.file.Files.newOutputStream(Files.java:215)
        at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:390)
        at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:383)
        at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:216)
        at org.apache.lucene.backward_codecs.store.EndiannessReverserUtil.createOutput(EndiannessReverserUtil.java:54)
        at org.opensearch.common.settings.KeyStoreWrapper.save(KeyStoreWrapper.java:530)
        at org.opensearch.tools.cli.keystore.RemoveSettingKeyStoreCommand.executeCommand(RemoveSettingKeyStoreCommand.java:70)
        at org.opensearch.tools.cli.keystore.BaseKeyStoreCommand.execute(BaseKeyStoreCommand.java:112)
        at org.opensearch.common.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:110)
        at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138)
        at org.opensearch.cli.MultiCommand.execute(MultiCommand.java:104)
        at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138)
        at org.opensearch.cli.Command.main(Command.java:101)
        at org.opensearch.tools.cli.keystore.KeyStoreCli.main(KeyStoreCli.java:62)

It’s also the same behaviour when trying to overwrite the existing item in the keystore, hanging at this point;

[root@vdc-vm-0026925 opensearch]# bin/opensearch-keystore add-file gcs.client.default.credentials_file gcp-service-account-key.json
Setting gcs.client.default.credentials_file already exists. Overwrite? [y/N]y

Happy to include any relevant config as required. Thanks!

@JakeHardy14 How did you deploy this cluster? Are you changing opensearch.keystore when the service is running?

Hi @pablo Thanks for the response, I’m terraforming some VMs in my organisations private cloud, then using an adapted version of the GitHub - opensearch-project/ansible-playbook: 🤖 A community repository for Ansible Playbook of OpenSearch Project. to deploy the service to the VMs.

WRT changing the keystore, I’ve tried both approaches, making changes while the service is running, and intentionally stopping the service before trying to add/delete from the keystore. From what i’ve seen, both approaches result in the same hanging.

@JakeHardy14 I had the same issue when I tried to configure AWS credentials for S3 snapshot.
I decided to recreate the opensearch.keystore before adding any items.

opensearch-keystore create will initialize a new keystore and overwrite the existing one.

This was my approach through Dockerfile.

RUN /usr/share/opensearch/bin/opensearch-keystore create

# Add credentials dynamically at runtime
RUN echo $AWS_ACCESS_KEY_ID | /usr/share/opensearch/bin/opensearch-keystore add --stdin s3.client.default.access_key
RUN echo $AWS_SECRET_ACCESS_KEY | /usr/share/opensearch/bin/opensearch-keystore add --stdin s3.client.default.secret_key

Hi @pablo ,

Appreciate the suggestion, attempting to create a new keystore returns a prompt to overwrite when a keystore is already existing;

An opensearch keystore already exists. Overwrite? [y/N]

I couldn’t find any options to force the creation reading the documentation and the —helpargument with the opensearch-keystore which might make that approach unusable in an automated install approach.

(OpenSearch keystore - OpenSearch Documentation)

On a positive note, it does seem that the add/remove options against the keystore do complete, although I’ve seen a high of ~14 minutes for the command to complete, running the following to validate my findings;

[root@vdc-vm-0026925 opensearch]# date && bin/opensearch-keystore list && bin/opensearch-keystore add-file gcs.client.default.credentials_file gcp-service-account-key.json && bin/opensearch-keystore list && date

Wed 12 Nov 11:22:10 GMT 2025

keystore.seed

gcs.client.default.credentials_file

keystore.seed

Wed 12 Nov 11:36:45 GMT 2025

Here’s an example of the longest running entry.

I wonder if there’s anything we can do to improve the speed of the add/remove actions?

@JakeHardy14 It looks like Dockerfile can pass this confirmation part and recreate the keystore.

I’ve done only deployments of the OS with OpenSearch Operator using an Ansible playbook.

So I understand that your playbook is recreating running the command to replace the keystore. Is that correct?

Could you share your playbook?

@pablo, The step for addition of items to the keystore doesn’t exist with in the base ansible playbook (linked earlier) and so we’ve got the following steps in there which expect to run on the command line;

- name: Plugins configuration | Ensure the removal of stale keystore before operations

ansible.builtin.file:

path: "{{os_home}}/config/opensearch.keystore.tmp"

state: absent

- name: Plugins configuration | Create/Replace the keystore

ansible.builtin.command: “{{os_home}}/bin/opensearch-keystore create”

timeout: 30

- name: Plugins configuration | Add GCP secret key to keystore

ansible.builtin.command: “{{os_home}}/bin/opensearch-keystore add-file gcs.client.default.credentials_file {{os_home}}/gcp-service-account-key.json”

ignore_errors: yes

The ansible logs show that the timeout is reached on the step (which aligns with the understanding that the same command on CLI prompts for a y/n)

TASK [linux/opensearch : Plugins configuration | Create/Replace the keystore] *************************************************************************************************************************************************************
fatal: [Stage-opensearch-data-node-1]: FAILED! => {“changed”: false, “msg”: “The ansible.builtin.command action failed to execute in the expected time frame (30) and was terminated”}
fatal: [Stage-opensearch-master-0]: FAILED! => {“changed”: false, “msg”: “The ansible.builtin.command action failed to execute in the expected time frame (30) and was terminated”}
fatal: [Stage-opensearch-data-node-0]: FAILED! => {“changed”: false, “msg”: “The ansible.builtin.command action failed to execute in the expected time frame (30) and was terminated”}

Bumping.

Would be interested to see if the slow performance of the opensearch-keystore is something more widespread, and if there’s any suggestions/debugging opportunities to avoid ~14 minute wait times for this command to complete. :slight_smile: