What's the use case of the otel-trace-group-prepper

Hi,

Still new with OpenTelemetry. When looking to setup my architecture, I found this plug-ins that was mentionned in some docs but not in the main one (https://github.com/opensearch-project/data-prepper/blob/main/docs/trace_analytics.md).

What’s the use case of this plug-in? I’m not sure to understand when I should use it or not. Or is it a kind a “last chance” processor that will act only when some missing data is present and avoid to loose “links” between traces?

Should we implement it by default?

Thanks,

This one involved a lot of digging. :eyes: Here is some documentation for that and what you can expect.

https://github.com/opensearch-project/data-prepper/blob/25968dd9c63f35b1881013f35a6eed64439be278/data-prepper-plugins/otel-trace-group-prepper/README.md

Thanks for the “lmgtfy” :smiley:
but I already read it but wasn’t able to understand in what case “missing trace group related fields” can be missed.

as said, I’m new to telemetry so I need to understand this to be able to know whether or not should I implement this.

thanks

Oh sorry yeah I put that in because I had a really hard time finding it myself honestly. From what I can tell it seems helpful as its supposed to ensure that all the fields are correctly populated in a span. I imagine those fields might be missing otherwise because of some race conditions that could occur in processing OTel traces. That is just a theory though.

Hi, @Vincent Thanks for your interest in Data Prepper. The otel-trace-group-prepper processor acts as a safeguard to populate those late arriving child spans with trace group info due to the fact that otel-trace-raw-prepper has limited time-window in populating those info. It can also be helpful when you have multiple data-prepper instances with peer-forwarder during scale-in and scale-out when spans with the same traceId might be forwarded to different instances.

Thanks for the feedback. So it is a recommandation to put this in place in an environment where we have process queuing like with rabbit MQ.

As of now, we should use the OTel Trace Raw Processor and not anymore the otel_trace_raw_prepper.
As the otel-trace-group-prepper seems to be using the output of otel_trace_raw_prepper. Is it still relevant for “Otel Trace Raw Processor” ? Are they compatible or will it be obsolete as otel_trace_raw_prepper in dataprepper 1.4?

Thanks

@Vincent

As the otel-trace-group-prepper seems to be using the output of otel_trace_raw_prepper. Is it still relevant for “Otel Trace Raw Processor” ?

Yes

Are they compatible or will it be obsolete as otel_trace_raw_prepper in dataprepper 1.4?

The processor otel_trace_group_prepper will no longer be supported in 2.0. Instead, it will be replaced by otel_trace_group which essentially has the same functionality but tailored to our data prepper internal event model.

Hi, @Vincent , @qchea - if anyone implemented this otel-trace-group, could you please share the full pipeline sections like source, process, and sink(out). I think the below Github doc missing complete examples
https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/otel-trace-group-prepper

Below is my current pipeline, currently I’m facing an issue with some requests missing the following fields

  "traceGroup": null,
  "traceGroupFields.endTime": null,
  "traceGroupFields.statusCode": null,
  "traceGroupFields.durationInNanos": null,
otel-trace-pipeline:
  # workers is the number of threads processing data in each pipeline. 
  # We recommend same value for all pipelines.
  # default value is 1, set a value based on the machine you are running Data Prepper
  workers: 2 
  # delay in milliseconds is how often the worker threads should process data.
  # Recommend not to change this config as we want the otel-trace-pipeline to process as quick as possible
  # default value is 3_000 ms
  delay: "100" 
  source:
    otel_trace_source:
      ssl: false # Change this to enable encryption in transit
      authentication:
        unauthenticated:
  buffer:
    bounded_blocking:
       # buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory. 
       # We recommend to keep the same buffer_size for all pipelines. 
       # Make sure you configure sufficient heap
       # default value is 512
       buffer_size: 512
       # This is the maximum number of request each worker thread will process within the delay.
       # Default is 8.
       # Make sure buffer_size >= workers * batch_size
       batch_size: 8
  sink:
    - pipeline:
        name: "raw-pipeline"
    - pipeline:
        name: "service-map-pipeline"
raw-pipeline:
  # Configure same as the otel-trace-pipeline
  workers: 2
  # We recommend using the default value for the raw-pipeline.
  delay: "3000" 
  source:
    pipeline:
      name: "otel-trace-pipeline"
  buffer:
      bounded_blocking:
         # Configure the same value as in otel-trace-pipeline
         # Make sure you configure sufficient heap
         # default value is 512
         buffer_size: 512
         # The raw processor does bulk request to your OpenSearch sink, so configure the batch_size higher.
         # If you use the recommended otel-collector setup each ExportTraceRequest could contain max 50 spans. https://github.com/opensearch-project/data-prepper/tree/v0.7.x/deployment/aws
         # With 64 as batch size each worker thread could process upto 3200 spans (64 * 50)
         batch_size: 64
  processor:
    - otel_trace_raw_prepper:
  sink:
    - opensearch:
        hosts: [ "http://10.81.211.25:9200" ]
        #trace_analytics_raw: true
        index_type: trace-analytics-raw
        insecure: true
        # Change to your credentials
        username: "admin"
        password: "admin"
        # Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate  
        #cert: /path/to/cert
        # If you are connecting to an Amazon OpenSearch Service domain without
        # Fine-Grained Access Control, enable these settings. Comment out the
        # username and password above.
        #aws_sigv4: true
        #aws_region: us-east-1
service-map-pipeline:
  workers: 2
  delay: "100"
  source:
    pipeline:
      name: "otel-trace-pipeline"
  processor:
    - service_map_stateful:
        # The window duration is the maximum length of time the data prepper stores the most recent trace data to evaluvate service-map relationships. 
        # The default is 3 minutes, this means we can detect relationships between services from spans reported in last 3 minutes.
        # Set higher value if your applications have higher latency. 
        window_duration: 600 
  buffer:
      bounded_blocking:
         # buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory. 
         # We recommend to keep the same buffer_size for all pipelines. 
         # Make sure you configure sufficient heap
         # default value is 512
         buffer_size: 512
         # This is the maximum number of request each worker thread will process within the delay.
         # Default is 8.
         # Make sure buffer_size >= workers * batch_size
         batch_size: 8
  sink:
    - opensearch:
        hosts: [ "http://10.81.211.25:9200" ]
        #trace_analytics_service_map: true
        index_type: trace-analytics-service-map
        insecure: true
        # Change to your credentials
        username: "admin"
        password: "admin"
        # Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate  
        #cert: /path/to/cert
        # If you are connecting to an Amazon OpenSearch Service domain without
        # Fine-Grained Access Control, enable these settings. Comment out the
        # username and password above.
        #aws_sigv4: true
        #aws_region: us-east-1
log-pipeline:
  source:
    http:
      ssl: false
  sink:
    - opensearch:
        hosts: ["http://10.81.211.25:9200"]
        insecure: true
        username: "admin"
        password: "admin"
        index: dcep-logs-%{yyyy.MM.dd}

Thanks!

able to configure like below but got an exception in the log

trace-group-pipeline:
  source:
    pipeline:
      name: "raw-pipeline"
  processor:
    - otel_trace_group:
        hosts: ["http://10.81.211.25:9200"]
        username: "admin"
        password: "admin"
  sink:
    - opensearch:
        hosts: ["http://10.81.211.25:9200"]
        insecure: true
        username: "admin"
        password: "admin"

Exception, can’t see any other details
2022-09-01T16:36:19,094 [main] INFO com.amazon.dataprepper.pipeline.Pipeline - Pipeline [trace-group-pipeline] - Initiating pipeline execution
2022-09-01T16:36:19,094 [main] INFO com.amazon.dataprepper.pipeline.Pipeline - Pipeline [trace-group-pipeline] - Submitting request to initiate the pipeline processing
2022-09-01T16:36:22,096 [trace-group-pipeline-processor-worker-9-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - trace-group-pipeline Worker: No records received from buffer
2022-09-01T16:37:24,801 [trace-group-pipeline-processor-worker-9-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - trace-group-pipeline Worker: Processing 8 records from buffer
2022-09-01T16:37:24,802 [trace-group-pipeline-processor-worker-9-thread-1] ERROR com.amazon.dataprepper.pipeline.ProcessWorker - Encountered exception during pipeline trace-group-pipeline processing

I would recommend adding the otel-trace-group-prepper to your existing pipeline configuration after the otel-trace-raw-prepper.

As @qchea mentioned. The otel-trace-group-prepper is being deprecated in favor of otel-trace-group in 2.0. The difference between these two is an internal model they except. Your pipeline definition as is, leverages the old model and need to use otel-trace-group-prepper.

If you want to use otel-trace-group and the new model, you need to update your source plugin slightly to change the internal data model leveraged by the source and migrate to otel_trace_raw instead of otel_trace_raw_prepper

...
  source:
    otel_trace_source:
      record_type: "event"
...
  preppers:
    `otel_trace_raw`
...

thank you very much, @cmanning09 ! I’m getting a better understanding, I’ve changed my pipeline to otel_trace_raw and source reacord_type: “event”, now the pipeline is able to process the trace information successfully but still I’m getting null values for some of the requests, in the UI Trace dashboard it’s empty, what could be the problem?
My flow: raw-pipeline → trace-group-pipeline → opensearch sink to send data…
Updated Pipeline script:

otel-trace-pipeline:
  # workers is the number of threads processing data in each pipeline. 
  # We recommend same value for all pipelines.
  # default value is 1, set a value based on the machine you are running Data Prepper
  #workers: 2 
  # delay in milliseconds is how often the worker threads should process data.
  # Recommend not to change this config as we want the otel-trace-pipeline to process as quick as possible
  # default value is 3_000 ms
  delay: "100" 
  source:
    otel_trace_source:
      ssl: false # Change this to enable encryption in transit
      record_type: event
      authentication:
        unauthenticated:
  buffer:
    bounded_blocking:
       # buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory. 
       # We recommend to keep the same buffer_size for all pipelines. 
       # Make sure you configure sufficient heap
       # default value is 512
       buffer_size: 10240
       # This is the maximum number of request each worker thread will process within the delay.
       # Default is 8.
       # Make sure buffer_size >= workers * batch_size
       batch_size: 160
  sink:
    - pipeline:
        name: "raw-pipeline"
    - pipeline:
        name: "service-map-pipeline"
raw-pipeline:
  source:
    pipeline:
      name: "otel-trace-pipeline"
  buffer:
      bounded_blocking:
         # Configure the same value as in otel-trace-pipeline
         # Make sure you configure sufficient heap
         # default value is 512
         buffer_size: 10240
         # The raw processor does bulk request to your OpenSearch sink, so configure the batch_size higher.
         # If you use the recommended otel-collector setup each ExportTraceRequest could contain max 50 spans. https://github.com/opensearch-project/data-prepper/tree/v0.7.x/deployment/aws
         # With 64 as batch size each worker thread could process upto 3200 spans (64 * 50)
         batch_size: 160
  processor:
    - otel_trace_raw:
  sink:
    - pipeline:
        name: "trace-group-pipeline"

trace-group-pipeline:
  #workers: 2
  source:
    pipeline:
      name: "raw-pipeline"
  processor:
    - otel_trace_group:
        hosts: ["http://10.81.211.25:9200"]
        username: "admin"
        password: "admin"
  sink:
    - opensearch:
        hosts: ["http://10.81.211.25:9200"]
        index_type: trace-analytics-raw
        insecure: true
        username: "admin"
        password: "admin"

service-map-pipeline:
  #workers: 2
  delay: "100"
  source:
    pipeline:
      name: "otel-trace-pipeline"
  processor:
    - service_map_stateful:
        # The window duration is the maximum length of time the data prepper stores the most recent trace data to evaluvate service-map relationships. 
        # The default is 3 minutes, this means we can detect relationships between services from spans reported in last 3 minutes.
        # Set higher value if your applications have higher latency. 
        #window_duration: 180 
  buffer:
      bounded_blocking:
         # buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory. 
         # We recommend to keep the same buffer_size for all pipelines. 
         # Make sure you configure sufficient heap
         # default value is 512
         buffer_size: 10240
         # This is the maximum number of request each worker thread will process within the delay.
         # Default is 8.
         # Make sure buffer_size >= workers * batch_size
         batch_size: 160
  sink:
    - opensearch:
        hosts: [ "http://10.81.211.25:9200" ]
        #trace_analytics_service_map: true
        index_type: trace-analytics-service-map
        insecure: true
        # Change to your credentials
        username: "admin"
        password: "admin"
        # Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate  
        #cert: /path/to/cert
        # If you are connecting to an Amazon OpenSearch Service domain without
        # Fine-Grained Access Control, enable these settings. Comment out the
        # username and password above.
        #aws_sigv4: true
        #aws_region: us-east-1

Null TraceGroup sample data:

[
  {
    "_index": "otel-v1-apm-span-000001",
    "_id": "4b7e5ca59b40be05",
    "_score": 7.4577246,
    "_source": {
      "traceId": "06ae51d662ff69c132eae6416b3d50ff",
      "droppedLinksCount": 0,
      "kind": "SPAN_KIND_CLIENT",
      "droppedEventsCount": 0,
      "traceGroupFields": {
        "endTime": null,
        "durationInNanos": null,
        "statusCode": null
      },
      "traceGroup": null,
      "serviceName": "dev-agent-api",
      "parentSpanId": "0fb5563b62c38f8c",
      "spanId": "4b7e5ca59b40be05",
      "traceState": "",
      "name": "dcep",
      "startTime": "2022-09-02T09:01:03.756723300Z",
      "links": [],
      "endTime": "2022-09-02T09:01:03.758052500Z",
      "droppedAttributesCount": 0,
      "durationInNanos": 1329200,
      "events": [],
      "span.attributes.db@statement_type": "Text",
      "instrumentationLibrary.version": "1.0.0.0",
      "resource.attributes.service@instance@id": "d15bf4b8-3d89-41d3-a4de-288a0133bddc",
      "span.attributes.db@name": "dcep",
      "resource.attributes.service@name": "dev-agent-api",
      "status.code": 0,
      "span.attributes.db@system": "postgresql",
      "instrumentationLibrary.name": "OpenTelemetry.EntityFrameworkCore"
    }
  },
  {
    "_index": "otel-v1-apm-span-000001",
    "_id": "0fb5563b62c38f8c",
    "_score": 7.4577246,
    "_source": {
      "traceId": "06ae51d662ff69c132eae6416b3d50ff",
      "droppedLinksCount": 0,
      "kind": "SPAN_KIND_SERVER",
      "droppedEventsCount": 0,
      **"traceGroupFields": {**
**        "endTime": null,**
**        "durationInNanos": null,**
**        "statusCode": null**
**      },**
**      "traceGroup": null,**
      "serviceName": "dev-agent-api",
      "parentSpanId": "df2b3492e7bba1b0",
      "spanId": "0fb5563b62c38f8c",
      "traceState": "",
      "name": "api/Config/skills",
      "startTime": "2022-09-02T09:01:03.752806800Z",
      "links": [],
      "endTime": "2022-09-02T09:01:03.759687900Z",
      "droppedAttributesCount": 0,
      "durationInNanos": 6881100,
      "events": [],
      "span.attributes.http@url": "http://croma.dcepdev.punelab.local/agent/api/Config/skills",
      "instrumentationLibrary.version": "1.0.0.0",
      "resource.attributes.service@instance@id": "d15bf4b8-3d89-41d3-a4de-288a0133bddc",
      "resource.attributes.service@name": "dev-agent-api",
      "status.code": 0,
      "instrumentationLibrary.name": "OpenTelemetry.Instrumentation.AspNetCore",
      "span.attributes.http@method": "GET",
      "span.attributes.http@user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36",
      "span.attributes.http@route": "api/Config/skills",
      "span.attributes.http@host": "croma.dcepdev.punelab.local",
      "span.attributes.http@target": "/agent/api/Config/skills",
      "span.attributes.http@scheme": "http",
      "span.attributes.http@flavor": "1.1",
      "span.attributes.http@status_code": 200
    }
  }
]

Hi @arulselvanj,

Based on the example here (data-prepper/trace_analytics.md at main · opensearch-project/data-prepper · GitHub), you should change your processor configuration for the trace-group-pipeline to

processor:
    - otel_trace_raw:
    - otel_trace_group:
        hosts: ["http://10.81.211.25:9200"]
        username: "admin"
        password: "admin"