OpenSearch Dashboard 2.4.0 unable to start

I seem to be unable to create any new topics under the “opensearch dashboard” and I am unable to reply to:

which seems to be exactly my issue.

After upgrading my cluster from OpenSearch 1.3.6 to 2.4.0 everything seemed to be running healthy - however when upgrading OpenSearch Dashboard to 2.4 , the logs state that:

StatusCodeError: Authorization Exception
    at respond (/usr/share/opensearch-dashboards/node_modules/elasticsearch/src/lib/transport.js:349:15)
    at checkRespForFailure (/usr/share/opensearch-dashboards/node_modules/elasticsearch/src/lib/transport.js:306:7)
    at HttpConnector.<anonymous> (/usr/share/opensearch-dashboards/node_modules/elasticsearch/src/lib/connectors/http.js:173:7)
    at IncomingMessage.wrapper (/usr/share/opensearch-dashboards/node_modules/lodash/lodash.js:4991:19)
    at IncomingMessage.emit (events.js:412:35)
    at IncomingMessage.emit (domain.js:475:12)
    at endReadableNT (internal/streams/readable.js:1333:12)
    at processTicksAndRejections (internal/process/task_queues.js:82:21) {
  status: 403,
  displayName: 'AuthorizationException',
  path: '/_plugins/_security/tenantinfo',
  query: {},
  body: undefined,
  statusCode: 403,
  response: '',
  toString: [Function (anonymous)],
  toJSON: [Function (anonymous)]
} | type=log @timestamp=2022-12-16T01:54:24Z tags=["error","plugins","securityDashboards"] pid=1

The docker container logs then go on to poll the /api/status endpoint which returns 401 until it eventually gives up:

GET /api/status 401 14ms - 9.0B | type=response @timestamp=2022-12-16T01:54:27Z tags=["api"] pid=1 method=get statusCode=401 req={"url":"/api/status","method":"get","headers":{"host":"localhost:5601","user-agent":"curl/7.79.1","accept":"*/*"},"remoteAddress":"127.0.0.1","userAgent":"curl/7.79.1"} res={"statusCode":401,"responseTime":14,"contentLength":9}

GET /api/status 401 3ms - 9.0B | type=response @timestamp=2022-12-16T01:54:37Z tags=["api"] pid=1 method=get statusCode=401 req={"url":"/api/status","method":"get","headers":{"host":"localhost:5601","user-agent":"curl/7.79.1","accept":"*/*"},"remoteAddress":"127.0.0.1","userAgent":"curl/7.79.1"} res={"statusCode":401,"responseTime":3,"contentLength":9}

We do have mutli-tenancy enabled and in use - I also tried rolling back to 2.0.0 and forwards to 2.4.1 but both have the same issue - I am no longer able to roll the dashboard back to 1.3.6 as all of the other components in the stack are now on 2.4.0 which makes it incompatible.

I have double checked the opensearch_dashboard.yml user and password are correct and they indeed do let me curl the opensearch nodes which return:

{
  "name" : "name",
  "cluster_name" : "cluster name",
  "cluster_uuid" : "cluster UUID",
  "version" : {
    "number" : "7.10.2",
    "build_type" : "tar",
    "build_hash" : "744ca260b892d119be8164f48d92b8810bd7801c",
    "build_date" : "2022-11-15T04:42:29.671309257Z",
    "build_snapshot" : false,
    "lucene_version" : "9.4.1",
    "minimum_wire_compatibility_version" : "7.10.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "The OpenSearch Project: https://opensearch.org/"
}

The behaviour tells me that something changed with the mutli-tenancy plugin (the resource at _plugins/_security/tenantinfo) but the error is fairly generic and doesn’t give me any hints on what to investigate - any help would be greatly appreciated.

Thanks!

Turns out this error is likely not be fatal.

I realized our docker containers health check was an un-credentialed curl to
https://localhost:{port}/api/status

which previous returned some non-fatal error code - but in this version returns a 401 and therefore the container would fail the health check and eventually terminate before dashboard could finish starting up.

so our options are to either allow list the the /api/status endpoint or supply the proper credentials to the containers health check action.