Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch 1.0
Describe the issue:
Reindex job failing with search_phase_execution_exception. We tried decreasing the batch size from default 1000 to 100 and still see the same issue. Any other options to try with?
Configuration:
Relevant Logs or Screenshots:
Can you show more information about your problem? Such as the full log of the search_phase_execution_exception
and the reindex parameters.
API is very simple as below. We are running 10 parallel reindex jobs and under load we hit this
POST /_reindex
{
"source":{
"index":"sourceIndex",
"size": 100
},
"dest":{
"index":"destIndex"
}
}
Below is the response from task api of that reindex operation
{
"completed": true,
"task": {
"node": "abc",
"id": 182462425,
"type": "transport",
"action": "indices:data/write/reindex",
"status": {
"total": 1629142,
"updated": 0,
"created": 128300,
"deleted": 0,
"batches": 1283,
"version_conflicts": 0,
"noops": 0,
"retries":
{ "bulk": 0, "search": 0 }
,
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0
},
"description": "reindex from [sourceIndex] to [destIndex1][_doc]",
"start_time_in_millis": 1694273514552,
"running_time_in_nanos": 1339667737279,
"cancellable": true,
"headers": {}
},
"error": {
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": -1,
"index": null,
"reason":
{ "type": "search_context_missing_exception", "reason": "No search context found for id [61314384]" }
}
],
"caused_by":
{ "type": "search_context_missing_exception", "reason": "No search context found for id [61314384]" }
}
}
This likely means that the underlying scroll expired. You could increase the timeout (which is really for processing each page), but the default scroll
should be 5m
. So I’m assuming your OpenSearch cluster can’t keep up with the load, that it can’t process a page in 5 minutes…
Maybe you can run less reindexing jobs in parallel?
@makam.sreekanth
As @radu.gheorghe said,
when you create heavy _tasks for reindexing like:
POST _reindex?wait_for_completion=false
{
"conflicts": "proceed",
"source": {
"index": ["abc.prd.reindex_2020"],
"size": 100
},
"dest": {
"index": "search.abc.prd",
"version_type": "external"
},
"script": {
"source": """
ctx._source.DELETE_YN= 'N';
""",
"lang": "painless"
}
}
you should insert scroll query parameter.
Please increase from the default interval(5m
) to the larger one(1d
) or decrease size of _source.