I setup my tracing system using opentelemetry – otel-collector – data-prepper --opensearch architecture as i deployed my appliction in AWS EKS with EFK from aws open search service
recently I found thtat my data-prepper pod occurs OOM periodically, at first, it use default jvm options which meas that its jmx parameter will be the 1/4 of available memory , since memory of my pod is 2G, the jmx parameter is 512 MB, so when it occured OOM, i observed its memory usage had been exceed 600MB, so I adjust the jmx parameter to 1.2 GB from 20220323, and I noticed that the memory usage was alway around 1.2GB, till today 20220329 , OOM occurs again, and the memory usage exceeded 1.3GB, i use jmap to capture the jvm memory usage as below
I saw that the number of RowSpan object exceeds 360000 and also the tracegroup object, I do not know whether this is the cause of OOM, but I wonder why there is so many RowSpan and Tracegroup objects remaining in memory
Anyone who can provide some hint or advice, it’s highly appreciated, thank in advance
Hi wwwlll2001,
Could you please kindly share with me your configuration files?
Thanks,
–ddpowers
Hi, ddpowers, below is my pipeline configuration file
apiVersion: v1
kind: ConfigMap
metadata:
name: pipelines-conf
labels:
app: opentelemetry
component: pipelines-conf
data:
pipelines: |
entry-pipeline:
delay: “100”
source:
otel_trace_source:
ssl: false
buffer:
bounded_blocking:
buffer_size: 128
batch_size: 8
sink:
- pipeline:
name: “raw-pipeline”
- pipeline:
name: “service-map-pipeline”
raw-pipeline:
source:
pipeline:
name: “entry-pipeline”
prepper:
- otel_trace_raw_prepper:
sink:
- opensearch:
hosts: {{ .Values.opensearch.hosts}}
insecure: true
username: {{ .Values.opensearch.username}}
password: {{ .Values.opensearch.password}}
trace_analytics_raw: true
service-map-pipeline:
delay: “100”
source:
pipeline:
name: “entry-pipeline”
prepper:
- service_map_stateful:
sink:
- opensearch:
hosts: {{ .Values.opensearch.hosts}}
insecure: true
username: {{ .Values.opensearch.username}}
password: {{ .Values.opensearch.password}}
trace_analytics_service_map: true
wwwlll2001,
It appears that this is a known issue with service maps ([BUG] OutofMemory error in service-map pipeline · Issue #1021 · opensearch-project/data-prepper · GitHub). This appears to be more of an issue with Data Prepper 1.1 so it may be worth a try to upgrade to 1.3 if you haven’t done so already as well as increasing the heap space.
Please let me know if there is anything else I can help with!
–ddpowers
Thanks for kindly response
I checked my data prepper version , and the version is 1.2.1, i do not know whether this version is also effected by this issue, I just upgrade it to 1.3.0, hope this could helps.
I will keep observing this issue if it still occurs, I will share the symptom.
1 Like