Unable to use copy_values on fields with dots in key (from an otel_trace_source)

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
latest - 2.1.0

Describe the issue:
I am using dataprepper to send metrics and traces to opensearch. It works, so that is great. I am getting spans in one index, and metrics in another, for a spring boot application.

Both indices have fields that indicate the name of the platform. But in one index the field is called “metric.attributes.platform”, and in the other one it is called “span.attributes.platform”. I would like to have a common name for these fields, for example just “platform”.

I figured that I should be able to do this with an additional processor, but so far I have not been able to make it work.

I tried copy_values, but apparently this doesn’t work on fields with dots in the key. At least not on the fields that I am getting from the otel_trace_source. It does work when I try the copy_values processor in test-setups. It even works when I am adding fields like “span.attributes.platform” in a preceding pipeline. But I am not able to make it work for the fields with dots that I am getting from the otel_trace_source. Somehow these fields must be special?

For example, in my pipeline I have:

entry-pipeline:
  source:
    otel_trace_source:
      ssl: false
  sink:
    - stdout:
    - pipeline:
        name: "intermediate-pipeline"
intermediate-pipeline:
  source:
    pipeline:
      name: "entry-pipeline"
  processor:
    - copy_values:
        entries:
        - from_key: "span.attributes.localBaseUrl"
          to_key: localBaseUrl
        - from_key: kind
          to_key: "span.attributes.platform"
  sink:
    - stdout:
    - pipeline:
        name: "raw-pipeline"
raw-pipeline:
  source:
    pipeline:
      name: "intermediate-pipeline"
  processor:
    - otel_trace_raw:
    - copy_values:
        entries:
        - from_key: "span.attributes.platform"
          to_key: platform
  sink:
    - stdout:   

So 3 steps, each of which logs the result to stdout. I would expect the output (on stdout) to be:

  • step 1: a copy of the input
  • step 2: an intermediate version where there is a new field “localBaseUrl” with as value a copy of the value of “span.attributes.localBaseUrl” and a new value for the field “span.attributes.plaform”, which gets copied from the “kind” field.
  • a final version with an additional field “platform” that has the value of the field “span.attributes.platform”.

The result is below. It shows that the intermediate step did not copy the value of “span.attributes.localBaseUrl”. It did replace the value of the field “span.attributes.plaform”.
The next step did copy the (new) value of the field “span.attributes.platform” to a new field “platform”.

My impression is that the original fields with dots in their name (“span.attributes.localBaseUrl”) are somehow special, which causes copy_values to ignore them. The field that I added in step 2 (“span.attributes.platform”) does not have this special property (even though it has the same name as an original field, which it replaces). The “kind” field (a field without dots) also does not have this propery, it is copied as expected.

:

{"traceId":"640c80b3c8882590f94f8af03b5b8380","droppedLinksCount":0,"kind":"SPAN_KIND_SERVER","droppedEventsCount":0,
"traceGroupFields":{"endTime":"2023-03-11T13:22:59.212675Z","durationInNanos":15632000,"statusCode":0},
"parentSpanId":"","spanId":"f94f8af03b5b8380","traceState":"","startTime":"2023-03-11T13:22:59.197043Z","links":[],
"endTime":"2023-03-11T13:22:59.212675Z","droppedAttributesCount":0,"durationInNanos":15632000,"events":[],
"span.attributes.net@peer@ip":"127.0.0.1","span.attributes.release":"99.0",
"span.attributes.localBaseUrl":"http://localhost","status.code":0,"span.attributes.http@method":"GET",
"span.attributes.stage":"dev","span.attributes.namespace":"dev-local","span.attributes.pod":"localhost-id",
"span.attributes.platform":"local","span.attributes.net@host@ip":"172.18.20.100"}

{"traceId":"640c80b3c8882590f94f8af03b5b8380","droppedLinksCount":0,"kind":"SPAN_KIND_SERVER","droppedEventsCount":0,
"traceGroupFields":{"endTime":"2023-03-11T13:22:59.212675Z","durationInNanos":15632000,"statusCode":0},
"parentSpanId":"","spanId":"f94f8af03b5b8380","traceState":"","startTime":"2023-03-11T13:22:59.197043Z","links":[],
"endTime":"2023-03-11T13:22:59.212675Z","droppedAttributesCount":0,"durationInNanos":15632000,"events":[],
"span.attributes.platform":"SPAN_KIND_SERVER","span.attributes.net@peer@ip":"127.0.0.1","span.attributes.release":"99.0",
"span.attributes.localBaseUrl":"http://localhost","status.code":0,"span.attributes.http@method":"GET",
"span.attributes.stage":"dev","span.attributes.namespace":"dev-local", "span.attributes.pod":"localhost-id",
"span.attributes.net@host@ip":"172.18.20.100"}

{"traceId":"640c80b3c8882590f94f8af03b5b8380","droppedLinksCount":0,"kind":"SPAN_KIND_SERVER","droppedEventsCount":0,
"traceGroupFields":{"endTime":"2023-03-11T13:22:59.212675Z","durationInNanos":15632000,"statusCode":0},
"parentSpanId":"","spanId":"f94f8af03b5b8380","traceState":"","startTime":"2023-03-11T13:22:59.197043Z","links":[],
"endTime":"2023-03-11T13:22:59.212675Z","droppedAttributesCount":0,"durationInNanos":15632000,"events":[],
"span.attributes.platform":"SPAN_KIND_SERVER",
"platform":"SPAN_KIND_SERVER","span.attributes.net@peer@ip":"127.0.0.1","span.attributes.release":"99.0",
"span.attributes.localBaseUrl":"http://localhost","status.code":0,"span.attributes.http@method":"GET",
"span.attributes.stage":"dev","span.attributes.namespace":"dev-local","span.attributes.pod":"localhost-id",
"span.attributes.net@host@ip":"172.18.20.100"}

Hello, @Willem, thank you for the post! I was able to reproduce your issue following the information above. And I was able to resolve the issue by using a json pointer syntax as the key: "attributes/span.attributes.localBaseUrl" (I was testing on another key, but I believe the issue behind is the same).

The issue is that the json you saw from the sink (stdout in this case) was flattened with “attributes” node removed. The data that copy_values processor processed actually has “span.attributes.localBaseUrl” key under “attributes”:

{
  ...
  "attributes": {
    "span.attributes.localBaseUrl": "some_value",
    ...
  }
}

We support json pointer syntax in from_key, so using “attributes/span.attributes.localBaseUrl” would let the copy processor find the key.

Hope this helps!

Yes! Thank you very much.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.