Reindex job failing with search phase execution exception

makam.sreekanth · September 11, 2023, 9:13am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch 1.0

Describe the issue:

Reindex job failing with search_phase_execution_exception. We tried decreasing the batch size from default 1000 to 100 and still see the same issue. Any other options to try with?

Configuration:

Relevant Logs or Screenshots:

gaobinlong · September 12, 2023, 5:17am

Can you show more information about your problem? Such as the full log of the search_phase_execution_exception and the reindex parameters.

makam.sreekanth · September 13, 2023, 9:13am

API is very simple as below. We are running 10 parallel reindex jobs and under load we hit this

POST /_reindex
{
   "source":{
      "index":"sourceIndex",
     "size": 100
   },
   "dest":{
      "index":"destIndex"
   }
}

Below is the response from task api of that reindex operation

{
"completed": true,
"task": {
"node": "abc",
"id": 182462425,
"type": "transport",
"action": "indices:data/write/reindex",
"status": {
"total": 1629142,
"updated": 0,
"created": 128300,
"deleted": 0,
"batches": 1283,
"version_conflicts": 0,
"noops": 0,
"retries":

{ "bulk": 0, "search": 0 }

,
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0
},
"description": "reindex from [sourceIndex] to [destIndex1][_doc]",
"start_time_in_millis": 1694273514552,
"running_time_in_nanos": 1339667737279,
"cancellable": true,
"headers": {}
},
"error": {
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": -1,
"index": null,
"reason":

{ "type": "search_context_missing_exception", "reason": "No search context found for id [61314384]" }

}
],
"caused_by":

{ "type": "search_context_missing_exception", "reason": "No search context found for id [61314384]" }

}
}

radu.gheorghe · September 15, 2023, 8:14am

This likely means that the underlying scroll expired. You could increase the timeout (which is really for processing each page), but the default scroll should be 5m. So I’m assuming your OpenSearch cluster can’t keep up with the load, that it can’t process a page in 5 minutes…

Maybe you can run less reindexing jobs in parallel?

yeonghyeonKo · August 19, 2024, 5:49am

@makam.sreekanth

As @radu.gheorghe said,

when you create heavy _tasks for reindexing like:

POST _reindex?wait_for_completion=false
{
  "conflicts": "proceed",
  "source": {
    "index": ["abc.prd.reindex_2020"],
    "size": 100
  },
  "dest": {
    "index": "search.abc.prd",
    "version_type": "external"
  },
  "script": {
    "source": """
      ctx._source.DELETE_YN= 'N';
    """,
    "lang": "painless"
  }
}

you should insert scroll query parameter.
Please increase from the default interval(5m) to the larger one(1d) or decrease size of _source.

Topic		Replies	Views
Question about failed reindex behavior Index Management	6	638	September 3, 2021
Migrate ElasticSearch 8.x to OpenSearch 1.3 DevOps troubleshoot , upgrade , index-management	1	1126	August 25, 2022
_reindex in "Dev Tools" OSD 2.7 not working as expected OpenSearch Dashboards	0	250	May 15, 2023
Reindex API Unexpected Timeouts OpenDistro	1	2149	December 29, 2021
Opensearch backed storage s3 translog failing to keep with the indexing speed OpenSearch discuss , troubleshoot , configure	0	24	July 31, 2024

Reindex job failing with search phase execution exception

Related topics