Operator - how to amend CA certs?

Versions:

OpenSearch - 3.1
OpenSearch Operator - 2.8
 

Describe the issue:

I am getting PKIX path building failed errors when trying to connect to S3 snapshoshts repo (Scality). I am sure I have got a valid CA chain and I cannot figure out a proper way to make it work for the operator managed OS.

I have got:

  • custom cacerts - via -Djavax.net.ssl.trustStore - configured by a configmap
  • /usr/share/opensearch/config/tls-transport/ca.crt, /usr/share/opensearch/config/tls-http/ca.crt - amended with the chain

and it still does not work (same error).

Configuration:

Relevant Operator yaml snippet:

security:
config:
adminCredentialsSecret:
name: admin-credentials-secret
securityConfigSecret:
name: sso-securityconfig-secret
tls:
http:
caSecret:
name: casecret
generate: true
transport:
caSecret:
name: casecret
generate: true
perNode: true

Relevant Logs or Screenshots:

    n/a

Please advise.

   

@mkur are you trying to provide your own CA to the operator, to create certificates from that CA? The indentation is not displayed in the above. Can you provide your complete yaml file surrounded by a code block (redact any sensitive information).

Is the cluster not starting at all, or you only see errors where trying to connect to Scality?

I managed to solve the certs issue:

    - name: cacerts
      path: /usr/share/opensearch/jdk/lib/security/cacerts
      subPath: cacerts  
      configMap:
        name: cacerts  

Now I am facing data issue, I can write but I cannot read:

'RepositoryVerificationException[[k8spoc] Failed to verify repository]; nested: 
      RepositoryVerificationException[[k8spoc] Seed read from master.dat was [16\r\nGq9rcqlfQb24oxqDca] 
      but expected seed [Gq9rcqlfQb24oxqDcaAarw]];'], 
   [Jv2M0JxPRqGwYxBMNh6fCA, 'RemoteTransportException[[pgeo-fun-masters-0]
      [192.168.30.70:9300][internal:admin/repository/verify]]; nested: 
      RepositoryVerificationException[[k8spoc] Failed to verify repository]; nested: 
        RepositoryVerificationException[[k8spoc] Seed read from master.dat was [16\r\nGq9rcqlfQb24oxqDca] 
          but expected seed [Gq9rcqlfQb24oxqDcaAarw]];'], 
          [8rdVTN_lRMaonaenRvDfTQ, 'RemoteTransportException[[pgeo-fun-nodes-0]
          [192.168.138.16:9300][internal:admin/repository/verify]]; nested: 
          RepositoryVerificationException[[k8spoc] Failed to verify repository]; nested: 
            RepositoryVerificationException[[k8spoc] Seed read from master.dat 
            was [16\r\nGq9rcqlfQb24oxqDca] but expected seed [Gq9rcqlfQb24oxqDcaAarw]];']]

@mkur how did you register the S3 repo? Have you tried to disable disable_chunked_encoding? See example below

PUT /_snapshot/test
{
  "type": "s3",
  "settings": {
    "bucket": "<your-bucket>",
    "base_path": "<your/prefix>",  
    "disable_chunked_encoding": true, 
    "path_style_access": true        
  }
}

Thanks @Anthony.

Tried this and now it works.
I can read write and read snapshots.

There’s still a glitch (on repo creation):



{
“error”: {
“root_cause”: [
{
“type”: “s3_exception”,
“reason”: “s3_exception: The Content-MD5 you specified did not match what we received. (Service: S3, Status Code: 400, Request ID: 7a0fffccda46a0594620, Extended Request ID: 7a0fffccda46a0594620) (SDK Attempt Count: 1)”
}
],
“type”: “repository_verification_exception”,
“reason”: “[k8spoc] cannot delete test data at [pgeofun]”,
“caused_by”: {
“type”: “runtime_exception”,
“reason”: “runtime_exception: java.util.concurrent.CompletionException: software.amazon.awssdk.services.s3.model.S3Exception: The Content-MD5 you specified did not match what we received. (Service: S3, Status Code: 400, Request ID: 7a0fffccda46a0594620, Extended Request ID: 7a0fffccda46a0594620) (SDK Attempt Count: 1)”,
“caused_by”: {
“type”: “completion_exception”,
“reason”: “completion_exception: software.amazon.awssdk.services.s3.model.S3Exception: The Content-MD5 you specified did not match what we received. (Service: S3, Status Code: 400, Request ID: 7a0fffccda46a0594620, Extended Request ID: 7a0fffccda46a0594620) (SDK Attempt Count: 1)”,
“caused_by”: {
“type”: “s3_exception”,
“reason”: “s3_exception: The Content-MD5 you specified did not match what we received. (Service: S3, Status Code: 400, Request ID: 7a0fffccda46a0594620, Extended Request ID: 7a0fffccda46a0594620) (SDK Attempt Count: 1)”
}
}
}
},
“status”: 500
} 

@mkur this error was a known bug in OS3.1, it was fixed in this PR, which landed in OS3.3.