Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch Helm Chart version: 2.27.1, appVersion: 2.18.0
Opensearch-Dashboards Helm Chart version: 2.25.0, appVersion: 2.18.0
Jaeger Helm Chart version: 3.3.3, appVersion: 1.53.0
DataPrepper Helm Chart version: 0.1.0, appVersion: 2.8.0
Describe the issue:
I have a setup with instrumented applications using OpenTelemetry (Otel) agents, which push traces to an Otel collector. The Otel collector sends data to both Jaeger and DataPrepper. However, I am noticing a difference in the behavior of the same traces when viewed in OpenSearch Dashboards depending on the data source selected (Jaeger vs. DataPrepper).
Specifically, when I select DataPrepper as the data source, I do not see the entire trace being marked as a trace with errors, and the errors are not displayed on the dashboard. In contrast, when using Jaeger as the data source, the errors are correctly visualized, and the entire trace is marked as an “error trace” if any span within the trace contains an error.
Configuration:
Jaeger:
jaeger:
agent:
enabled: false
provisionDataStore:
cassandra: false
elasticsearch: false
collector:
enabled: true
annotations: {}
image:
registry: ""
repository: jaegertracing/jaeger-collector
tag: ""
digest: ""
envFrom: []
cmdlineParams: {}
basePath: /
replicaCount: 1
service:
otlp:
grpc:
name: "otlp-grpc"
port: 4317
http:
name: "otlp-http"
port: 4318
serviceAccount:
create: true
storage:
type: elasticsearch
elasticsearch:
scheme: http
host: opensearch-cluster-master.opensearch-otel.svc.cluster.local
port: 9200
anonymous: true
usePassword: false
- name: SPAN_STORAGE_TYPE
value: "opensearch"
- name: ES_TAGS_AS_FIELDS_ALL
value: "true"
tls:
enabled: false
DataPrepper:
config:
otel-trace-pipeline:
delay: "1000"
source:
otel_trace_source:
ssl: false
buffer:
bounded_blocking:
buffer_size: 10240
batch_size: 160
sink:
- pipeline:
name: "raw-traces-pipeline"
- pipeline:
name: "otel-service-map-pipeline"
raw-traces-pipeline:
source:
pipeline:
name: "otel-trace-pipeline"
buffer:
bounded_blocking:
buffer_size: 10240
batch_size: 160
processor:
- otel_trace_raw:
- otel_trace_group:
hosts: [ "http://opensearch-cluster-master:9200" ]
insecure: true
sink:
- opensearch:
hosts: [ "http://opensearch-cluster-master:9200" ]
insecure: true
index_type: trace-analytics-raw
otel-service-map-pipeline:
delay: "1000"
source:
pipeline:
name: "otel-trace-pipeline"
buffer:
bounded_blocking:
buffer_size: 10240
batch_size: 160
processor:
- service_map_stateful:
window_duration: 300
sink:
- opensearch:
hosts: [ "http://opensearch-cluster-master:9200" ]
insecure: true
index_type: trace-analytics-service-map
index: otel-v1-apm-span-%{yyyy.MM.dd}
#max_retries: 20
bulk_size: 4
Relevant Logs or Screenshots:
DataPrepper source. Error in span, but not all trace marked with Error, and no statistics observed.
But simultaniously in Jaeger source. Error is observed in span and the whole trace marked with error (in the right top corner, next capture)
The same source, same traceID
Please share your suggestions on how to fix it. Thanks