Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
From AWS Console:
Version Info = OpenSearch 2.17 (latest), Successfully updated to service software version OpenSearch_2_17_R20241112-P5
Dashboard = You want the url to the dashboard?
OS = Windows 11.
Browser = Google Chrome
Describe the issue:
Have OpenTofu(Terraform) IAC to deploy entire stack. I use “opensearch_roles_mapping” and “opensearch_role” to setup a new role and map the backend roles to the IAM roles setup.
The OpenSearch is setup to use that stacks Cognito pool.
Have a group for ‘master’ user access to the opensearch instance. One for users and one for admin as well. Master and Pipeline IAM roles are mapped to all_access and security OpenSearch roles. User and Admin are mapped to the newly created “application_user” OpenSearch role.
Only one out of 10 instances has the error today, Error = “### Missing Role No roles available for this user, please contact your system administrator.”
When I edit the security and set a master user with password, drop the cognito settings, clear out the policy, I am able to login and see there is no role mapping. If I manually add them back in and then edit the security back to IAM master, select the cognito pool and put in the access policy again it then works fine again for Dashboard access and the applications access to the indices.
Configuration:
(NOTE all the below have been obfuscated a bit)
Access policy:
{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“AWS”: [
“arn:aws:iam::[Acc#]:role/IAM-Master-Role-stack1”,
“arn:aws:iam::[Acc#]:role/PipelineRole-stack1”
]
},
“Action”: [
“es:DescribeDomain”,
“es:ESHttp*”
],
“Resource”: “arn:aws:es:ap-southeast-2:[Acc#]:domain/domain-stack1/*”
}
]
}
Policy in the IAM Master:
{
“Statement”: [
{
“Action”: [
“osis:CreatePipeline”,
“osis:UpdatePipeline”,
“osis:DeletePipeline”,
“osis:StartPipeline”,
“osis:StopPipeline”,
“osis:ListPipelines”,
“osis:GetPipeline”,
“osis:GetPipelineChangeProgress”,
“osis:ValidatePipeline”,
“osis:GetPipelineBlueprint”,
“osis:ListPipelineBlueprints”,
“osis:TagResource”,
“osis:UntagResource”,
“osis:ListTagsForResource”,
“osis:Ingest”
],
“Effect”: “Allow”,
“Resource”: “arn:aws:osis:ap-southeast-2:acct#:pipeline/pipe-stack1”
},
{
“Action”: “iam:CreateServiceLinkedRole”,
“Condition”: {
“StringLike”: {
“iam:AWSServiceName”: “osis.amazonaws.com”
}
},
“Effect”: “Allow”,
“Resource”: “arn:aws:iam:::role/aws-service-role/osis.amazonaws.com/AWSServiceRoleForAmazonOpenSearchIngestionService"
},
{
“Action”: "es:”,
“Effect”: “Allow”,
“Resource”: “arn:aws:es:ap-southeast-2:acct#:domain/domain-stack1/*”
}
],
“Version”: “2012-10-17”
}
Master role also has AWS managed policies AWSAppSyncInvokeFullAccess and AWSAppSyncSchemaAuthor.
The Admin and user roles have policy with:
{
“Action”: [
“es:ESHttpGet”,
“es:ESHttpPost”
],
“Effect”: “Allow”,
“Resource”: “arn:aws:es:ap-southeast-2:acct#:domain/domain-stack1/*”
},
{
“Action”: [
“iam:PassRole”
],
“Effect”: “Allow”,
“Resource”: [
“arn:aws:iam::acct#:role/Master-Role-stack1”,
“arn:aws:iam::acct#:role/IAM-Admin-Role-stack1”,
“arn:aws:iam::acct#:role/IAM-User-Role-stack1”
]
},
Relevant Logs or Screenshots:
All I could show is the roles with no backend roles assigned where the other stacks have them all.
Where should I get logs from or start to log to identity the cause of these roles disappearing.
Just now adding in the search & index slow logs, error logs and audit logs. Would any of these logs capture the opensearch instance being reverted to a new instance or hacks deleting out the backend role mapping?
Perhaps I need to add in snapshot management for the stacks to create backups to restore from for when it next happens…