OpenSearch Sink - Connection timeout

Hi Team, I’m trying to connect Data Prepper with my AWS opensearch deployed domain

otel-trace-pipeline:
  # workers is the number of threads processing data in each pipeline. 
  # We recommend same value for all pipelines.
  # default value is 1, set a value based on the machine you are running Data Prepper
  workers: 8
  # delay in milliseconds is how often the worker threads should process data.
  # Recommend not to change this config as we want the otel-trace-pipeline to process as quick as possible
  # default value is 3_000 ms
  delay: "100"
  source:
    otel_trace_source:
      ssl: false # Change this to enable encryption in transit
      authentication:
        unauthenticated:
  buffer:
    bounded_blocking:
       # buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory. 
       # We recommend to keep the same buffer_size for all pipelines. 
       # Make sure you configure sufficient heap
       # default value is 512
       buffer_size: 512
       # This is the maximum number of request each worker thread will process within the delay.
       # Default is 8.
       # Make sure buffer_size >= workers * batch_size
       batch_size: 8
  sink:
    - pipeline:
        name: "raw-pipeline"
    - pipeline:
        name: "service-map-pipeline"
raw-pipeline:
  # Configure same as the otel-trace-pipeline
  workers: 8
  # We recommend using the default value for the raw-pipeline.
  delay: "3000"
  source:
    pipeline:
      name: "otel-trace-pipeline"
  buffer:
      bounded_blocking:
         # Configure the same value as in otel-trace-pipeline
         # Make sure you configure sufficient heap
         # default value is 512
         buffer_size: 512
         # The raw processor does bulk request to your OpenSearch sink, so configure the batch_size higher.
         # If you use the recommended otel-collector setup each ExportTraceRequest could contain max 50 spans. https://github.com/opensearch-project/data-prepper/tree/v0.7.x/deployment/aws
         # With 64 as batch size each worker thread could process upto 3200 spans (64 * 50)
         batch_size: 64
  processor:
    - otel_trace_raw_prepper:
  sink:
    - opensearch:
        hosts: [ "https://vpc-logs-dev-**********.us-west-1.es.amazonaws.com" ]
        trace_analytics_raw: true
        # Change to your credentials
        username: "********"
        password: "********"
        index: test
        # Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate  
        #cert: /path/to/cert
        # If you are connecting to an Amazon OpenSearch Service domain without
        # Fine-Grained Access Control, enable these settings. Comment out the
        # username and password above.
        #aws_sigv4: true
        #aws_region: us-east-1
service-map-pipeline:
  workers: 8
  delay: "100"
  source:
    pipeline:
      name: "otel-trace-pipeline"
  processor:
    - service_map_stateful:
        # The window duration is the maximum length of time the data prepper stores the most recent trace data to evaluvate service-map relationships. 
        # The default is 3 minutes, this means we can detect relationships between services from spans reported in last 3 minutes.
        # Set higher value if your applications have higher latency. 
        window_duration: 180
  buffer:
      bounded_blocking:
         # buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory. 
         # We recommend to keep the same buffer_size for all pipelines. 
         # Make sure you configure sufficient heap
         # default value is 512
         buffer_size: 512
         # This is the maximum number of request each worker thread will process within the delay.
         # Default is 8.
         # Make sure buffer_size >= workers * batch_size
         batch_size: 8
  sink:
    - opensearch:
        hosts: [ "https://vpc-logs-dev-********.us-west-1.es.amazonaws.com" ]
        trace_analytics_service_map: true
        # Change to your credentials
        username: "******"
        password: "******"
                          

After I run the docker container I get the following error:

Caused by: java.lang.RuntimeException: 30,000 milliseconds timeout on connection http-outgoing-0 [ACTIVE]
	at com.amazon.dataprepper.plugins.sink.opensearch.OpenSearchSink.<init>(OpenSearchSink.java:92) ~[data-prepper.jar:1.5.1]
	... 82 more
Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-0 [ACTIVE]
	at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:892) ~[data-prepper.jar:1.5.1]

It is my understanding that it is not able to connect to the domain. I’m new with opensearch and I’m struggling to understand why the connection is failing

Regards

Just to add more information, I’ve tried both on opensearch aws service:

  • Fine grained access only
  • Access control policy:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": "*",
      "Action": "es:*",
      "Resource": "arn:aws:es:us-west-1:*******:domain/logs-dev/*"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::*****:user/jose@getkalto.com"
      },
      "Action": "es:ESHttp*",
      "Resource": [
        "arn:aws:es:us-west-1:*****:domain/logs-dev/otel-v1*",
        "arn:aws:es:us-west-1:*****:domain/logs-dev/_template/otel-v1*",
        "arn:aws:es:us-west-1:*****:domain/logs-dev/_plugins/_ism/policies/raw-span-policy",
        "arn:aws:es:us-west-1:*****:domain/logs-dev/_alias/otel-v1*"
      ]
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::*******:user/*****"
      },
      "Action": "es:ESHttpGet",
      "Resource": "arn:aws:es:us-west-1:******:domain/logs-dev/_cluster/settings"
    }
  ]
}

Still I get the connection timeout

Hi @jocheinfa,

It is interesting that the error report is a SocketTimeoutException. I don’t know if this will completely fix your problem, but you do have an invalid configuration in the first opensearch sink settings. When either trace_analytics_raw or trace_analytics_service_map are provided as true, you are not able to set the index parameter. For your config, this means that you should change

processor:
    - otel_trace_raw_prepper:
  sink:
    - opensearch:
        hosts: [ "https://vpc-logs-dev-**********.us-west-1.es.amazonaws.com" ]
        trace_analytics_raw: true
        # Change to your credentials
        username: "********"
        password: "********"
        index: test

to

processor:
    - otel_trace_raw_prepper:
  sink:
    - opensearch:
        hosts: [ "https://vpc-logs-dev-**********.us-west-1.es.amazonaws.com" ]
        trace_analytics_raw: true
        # Change to your credentials
        username: "********"
        password: "********"

Other than that, your configuration looks correct. You could also try setting the aws_region field to us-west-1 to see if that helps at all.

The connection problem is most likely due to the Amazon OpenSearch domain living in a VPC. If your domain is in a VPC and not using public access, Data Prepper will need to be running in the same VPC as OpenSearch or in a VPC that has access to the OpenSearch VPC to connect properly

I made the changes mentioned and also used:

        aws_sigv4: true
        aws_region: us-west-1

and worked