Continuously seeing [monitor_only mode] cancelling task [165525] due to high resource consumption [heap usage exceeded [257mb >= 550kb]]" messages in my cluster

chirumanem · April 4, 2024, 6:46am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.9.0 version

Describe the issue:
We performed some search operation and noticed that logs appearing in OpenSearch with a delay

We observed in one of the data container logs “[monitor_only mode] cancelling task [165525] due to high resource consumption [heap usage exceeded [257mb >= 550kb]]” " messages like following:

“message”:"[monitor_only mode] cancelling task [165525] due to high resource consumption [heap usage exceeded [257mb >= 550kb]

Some other log messages also observed with “Running full sweep” \ “Will delay 20535 miliseconds for next execution” messages:

“message”:“Finished executing attempt_transition_step for .opendistro-job-scheduler-lock”
“message”:“Will delay 20535 miliseconds for next execution of job adp-app-logs-2024.04.01”
*“message”:“Executing attempt_transition_step for adp-app-logs-2024.04.01”
“message”:“Finished executing attempt_transition_step for adp-app-logs-2024.04.01”
“message”:“Running full sweep”

I understand that these cancelling requests are coming from search backpressure, but according to opensearch document the search backpressure runs by default in monitor_only and “monitor_only” mode doesn’t actually reject the requests:

Search backpressure modes

Search backpressure runs in monitor_only (default), enforced, or disabled mode. In the enforced mode, the server rejects search requests. In the monitor_only mode, the server does not actually cancel search requests but tracks statistics about them.

According to the provided logs and my node statistics, the search backpressure mode has been set to “monitor_only” So the requests should not have been rejected?

Can someone please help me to understand the following concerns:
1) Even when search backpressure is in monitor_only state, why OpenSearch has “cancelling task [165525] due to high resource consumption [heap usage exceeded [257mb >= 550kb]]” messages ? And whether there is some way to avoid similar issues in the future?
2) What is the reason for constant “Will delay 20535 miliseconds for next execution of” messages in OpenSearch data nodes?
3) What are the tasks that OpenSearch constantly cancels?
4) The logs say heap usage exceeded [257mb >= 550kb] . What are those 550kb (is it configured somewhere)?
5) Is it possible to somehow determine when the particular log was saved into the corresponding index?

Configuration:
jvmHeap:
data: 4096m
ingest: 640m
master: 640m
replicaCount:
data: 3
ingest: 2
master: 3
resources:
data:
limits:
cpu: 3
memory: 8Gi
requests:
cpu: 2.5
memory: 8Gi

chirumanem · April 9, 2024, 4:41am

Hi any update on this request?

chirumanem · April 16, 2024, 4:28am

Hi, can someone help on this issue?

Gsmitt · April 16, 2024, 4:29am

Hey @chirumanem

I might be able to help

Gsmitt · April 16, 2024, 4:33am

Ok I had to read over what you wrote.

What I understand is you have an issue with you heap. I’m not seeing your configuration file.
What did you set you heap at?

Gsmitt · April 16, 2024, 4:39am

Hey @chirumanem

I’m note sure what installation you have but you need to specify initial and maximum JVM heap sizes. From this file —> jvm.options.

  vi /etc/opensearch/jvm.options

As an example, if the host machine has 8 GB of memory, then you might want to set the initial and maximum heap sizes to 4 GB:

-Xms4g
-Xmx4g

If this is Docker then you need the environment variables for heap setting. hope that helps

chirumanem · April 17, 2024, 12:43am

Hi @Gsmitt ,
Thank you for your support.

Our configuration is same like your example, allocated half of memory to the heap size, but still we are seeing cancelling task warning msgs and observing that logs appearing in OpenSearch with a delay

“message”:"[monitor_only mode] cancelling task [165525] due to high resource consumption [heap usage exceeded [257mb >= 550kb]

The jvm.options file in below path is evidencing that we configured half memory to the heap (we allocated 8 GB memory and 4096 mb to the heap)

/etc/opensearch/config/jvm.options.d/jvm.options

-Xms4096m
-Xmx4096m

Regards,
Chiranjeevi

Gsmitt · April 17, 2024, 12:58am

Hey ,

If you noticed in the jvm.options file, its default display looks like this

## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g

Try setting it like so and restart service.

EDIT: @chirumanem found another post that looks like your issue.

Topic		Replies	Views
Heap threshold exceeded, leading to cancelled tasks OpenSearch	2	283	October 2, 2024
Cancelled task with reason: heap usage exceeded OpenSearch	5	625	November 4, 2024
CPU usage get spiked(100%) intermittently in OpenSearch cluster and it causing all the search operation to fail OpenSearch discuss , troubleshoot , upgrade	2	863	August 29, 2023
High heap size on data nodes OpenSearch troubleshoot	5	6701	June 7, 2023
Unable to start opensearch: loop 'failed to apply settings' and 'rate must be greater than zero' OpenSearch	2	67	August 29, 2024

Continuously seeing [monitor_only mode] cancelling task [165525] due to high resource consumption [heap usage exceeded [257mb >= 550kb]]" messages in my cluster

Search backpressure modes

Related topics