Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch and Dashboards v 2.12.0
Describe the issue:
Hello,
I am sending OTLP traces to an OTEL collector, which then sends it to a DataPrepper instance and then finally to an OpenSearch backend.
Application Traces → OTEL Collector → DataPrepper → OpenSearch
The issue I am facing is that for traces with multiple spans, there is a delay observed from trace generation to being able to view them on OpenSearch Dashboards. The traces can be viewed on OpenSearch Dashboards, only after 5-10 minutes after the trace is generated in the application.
I have reviewed the OTEL collector logs and found that the traces are being forwarded almost instantly to DataPrepper once recieved. It looks like DataPrepper might be taking a long time to process the traces with multple spans and send them to OpenSearch.
At the moment, I am testing with individual traces (around 150KB) so I do not believe that it should be taking this long even with default DataPrepper configuration.
I have tried increasing the heap size, increasing the number of workers and also buffer sizes to no avail…
I am also not seeing any warning/error messages on DataPrepper logs (debug enabled)
Configuration:
OTEL Collector
extensions:
health_check:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 10s
static_configs:
- targets: ['0.0.0.0:8888']
processors:
batch:
send_batch_size: 500
timeout: 1s
exporters:
debug:
verbosity: detailed
sampling_initial: 5
sampling_thereafter: 200
otlp/data-prepper:
endpoint: '<dataprepper-ip>:21890'
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [debug,otlp/data-prepper]
DataPrepper Configuration
entry-pipeline:
workers: 4
delay: "5"
source:
otel_trace_source:
ssl: false
buffer:
bounded_blocking:
buffer_size: 25600
batch_size: 500
sink:
- pipeline:
name: "raw-trace-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-trace-pipeline:
workers: 8
source:
pipeline:
name: "entry-pipeline"
buffer:
bounded_blocking:
buffer_size: 25600
batch_size: 500
processor:
- otel_trace_raw:
sink:
- opensearch:
hosts: ["https://localhost:9200"]
insecure: true
username: admin
password: admin
index_type: trace-analytics-raw
service-map-pipeline:
workers: 8
delay: "5"
source:
pipeline:
name: "entry-pipeline"
buffer:
bounded_blocking:
buffer_size: 25600
batch_size: 500
processor:
- service_map_stateful:
sink:
- opensearch:
hosts: ["https://localhost:9200"]
insecure: true
username: admin
password: admin
index_type: trace-analytics-service-map
Any help is appreciated, thanks.
Amith