Logstash not pushing data to AWS Opensearch

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.13 (latest)

Describe the issue:
I have an AWS Opensearch Domain to which I am trying to Ingest Json codec files from GCS bucket using Logstash. I can see a log stating that my logstash has connected to Opensearch domain and a few other logs stating it is trying to fetch logs from GCS bucket.

This is my logstash configuration.

input {
  google_cloud_storage {
    bucket_id => "test-bucket"
    json_key_file => "/etc/logstash/credentials.json"
    codec => "json_lines"
  }
}

filter {
}

output {
  opensearch {
    hosts => "https://<name>.us-east-1.es.amazonaws.com:443"
    user => "admin"
    password => "admin"
    index => "logstash-test-1"
    ssl_certificate_verification => true
  }
}

I am running this config on my GKE. I created a custom image out of the official Elastic Search Logstash image and installed opensearch output plugin on the same. These are my logs.

Logs to find out that Opensearch connection has been successful:

[2024-06-18T21:26:20,336][INFO ][logstash.javapipeline    ] Pipeline `main` is configured with `pipeline.ecs_compatibility: v8` setting. All plugins in this pipeline will default to `ecs_compatibility => v8` unless explicitly configured otherwise.
[2024-06-18T21:26:20,353][INFO ][logstash.outputs.opensearch][main] New OpenSearch output {:class=>"LogStash::Outputs::OpenSearch", :hosts=>["https://search-<domain>.us-east-1.es.amazonaws.com:443"]}
[2024-06-18T21:26:20,371][INFO ][logstash.outputs.opensearch][main] OpenSearch pool URLs updated {:changes=>{:removed=>[], :added=>[https://admin:xxxxxx@search-<domain>.us-east-1.es.amazonaws.com:443/]}}
[2024-06-18T21:26:20,639][WARN ][logstash.outputs.opensearch][main] Restored connection to OpenSearch instance {:url=>"https://admin:xxxxxx@search-<domain>.us-east-1.es.amazonaws.com:443/"}
[2024-06-18T21:26:20,696][INFO ][logstash.outputs.opensearch][main] Cluster version determined (2.13.0) {:version=>2}
[2024-06-18T21:26:20,735][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>500, "pipeline.sources"=>["/usr/share/logstash/pipeline/logstash.conf"], :thread=>"#<Thread:0x701b2684 /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 run>"}

This log is continously appearing but no signs of new logs found/pushed to domain:
Fetching blobs from test-bucket

Also, these logs appear very often, but I am assuming it is due to missing elastic search live cluster. Let me know if these are good to be ignored

[2024-06-18T21:27:43,039][ERROR][logstash.licensechecker.licensereader] Unable to retrieve Elasticsearch cluster info. {:message=>"No Available connections", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError}

[2024-06-18T21:27:43,041][ERROR][logstash.licensechecker.licensereader] Unable to retrieve license information from license server {:message=>"No Available connections"}

[2024-06-18T21:27:43,099][INFO ][logstash.licensechecker.licensereader] Failed to perform request {:message=>"elasticsearch: Name or service not known", :exception=>Manticore::ResolutionFailure, :cause=>#<Java::JavaNet::UnknownHostException: elasticsearch: Name or service not known>}

[2024-06-18T21:27:43,101][WARN ][logstash.licensechecker.licensereader] Attempted to resurrect connection to dead ES instance, but got an error {:url=>"http://elasticsearch:9200/", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :message=>"Elasticsearch Unreachable: [http://elasticsearch:9200/][Manticore::ResolutionFailure] elasticsearch: Name or service not known"}

Cheers

@RandomDD If you’re not sure which part is failing in this pipeline, I would suggest testing with a file input instead to see if the index is created on the OpenSearch side.

You could also try a test example from OS documentation. (Step 4)

From the config of Logstash I can see that you set ssl_certificate_verification to true, that means cacert should be set, you can try set ssl to true and set ssl_certificate_verification to false to test that config.

In addition, the error message Unable to retrieve Elasticsearch cluster info... comes from logstash.licensechecker.licensereader, you can try use logstash oss version instead to get rid of that, logstash oss version doesn’t have any x-pack feature, and the license checker inside.

I think the opensearch config seems fine to me. Because I was able to import data from a local file using a k8s configMap to the same domain using the same output plugin config.
Is GCS input even supported by logstash which supports opensearch as output?

Also, I get these logs while trying to upload

[2024-06-19T07:44:48,991][INFO ][logstash.inputs.googlecloudstorage][main] ProcessedDb created in: /usr/share/logstash/data/plugins/inputs/google_cloud_storage/db
[2024-06-19T07:44:48,994][INFO ][logstash.inputs.googlecloudstorage][main] Turn on debugging to explain why blobs are filtered.
[2024-06-19T07:44:48,995][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
[2024-06-19T07:44:49,002][INFO ][logstash.inputs.googlecloudstorage][main][011147c1b5e37daed60853590135a668b4d759561693bc05656b7b312899127c] Fetching blobs from test-bucket

Is there a way to turn on debug logging while my logstash in running on K8s? I tried using this, but it fails.

apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-config
  namespace: logstash
data:
  logstash.yml: |
    path.logs: /usr/share/logstash/logs
    config.debug: true
    log.level: debug
  logstash.conf: |
    input {
      google_cloud_storage {
        bucket_id => "test-bucket"
        json_key_file => "/etc/logstash/credentials.json"
        codec => "json_lines"
      }
    }

    output {
      opensearch {
        hosts => "https://search-<domain>.us-east-1.es.amazonaws.com:443/"
        user => "admin"
        password => "admin"
        index => "logstash-test-1"
        ssl_certificate_verification => true
      }
    }

@RandomDD As per @gaobinlong comment. Try the Logstash version with the OpenSearch output plugin.

https://hub.docker.com/r/opensearchproject/logstash-oss-with-opensearch-output-plugin

Also as per @gaobinlong comment, you’re missing ssl=>true in your output config. Please follow an example from OS documentation shared in my previous post.

I can see the Opensearch working without any issues as I am able to push files from configMap to same domain using just a different input and same output. So I don’t think so there are any issues with the output config. Anyways, I changed both according to your suggestion. Used the opensearch logstash image.

This is my custom Docker Image I build out of it.

FROM opensearchproject/logstash-oss-with-opensearch-output-plugin
USER root
RUN apt-get update && apt-get install -y shared-mime-info && apt-get clean
RUN bin/logstash-plugin install logstash-input-google_cloud_storage
WORKDIR /usr/share/logstash
CMD ["bin/logstash"]

And this is updated logstash config.

    input {
      google_cloud_storage {
        bucket_id => "test-bucket"
        json_key_file => "/etc/logstash/credentials.json"
        codec => "json_lines"
      }
    }

    output {
      opensearch {
        hosts => "https://search-<domain>.us-east-1.es.amazonaws.com:443/"
        user => "admin"
        password => "admin"
        index => "logstash-test-1"
        ssl => true
        ssl_certificate_verification => false
      }
    }

But still no luck in pushing the files. Here are the logs, glad the elastic search logs aren’t here:

[2024-06-19T09:15:12,738][INFO ][logstash.outputs.opensearch][main] Using a default mapping template {:version=>2, :ecs_compatibility=>:v8}
[2024-06-19T09:15:13,831][INFO ][logstash.javapipeline    ][main] Pipeline Java execution initialization time {"seconds"=>1.11}
[2024-06-19T09:15:14,084][INFO ][logstash.inputs.googlecloudstorage][main] ProcessedDb created in: /usr/share/logstash/data/plugins/inputs/google_cloud_storage/db
[2024-06-19T09:15:14,085][INFO ][logstash.inputs.googlecloudstorage][main] Turn on debugging to explain why blobs are filtered.
[2024-06-19T09:15:14,087][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
[2024-06-19T09:15:14,091][INFO ][logstash.inputs.googlecloudstorage][main][6278fa388e5b5004f390348cab6962e1c49ff5ef2e012a1436a636cecb12a3c8] Fetching blobs from test-bucket
[2024-06-19T09:15:14,102][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2024-06-19T09:16:14,092][INFO ][logstash.inputs.googlecloudstorage][main][6278fa388e5b5004f390348cab6962e1c49ff5ef2e012a1436a636cecb12a3c8] Fetching blobs from test-bucket
[2024-06-19T09:17:14,091][INFO ][logstash.inputs.googlecloudstorage][main][6278fa388e5b5004f390348cab6962e1c49ff5ef2e012a1436a636cecb12a3c8] Fetching blobs from test-bucket
[2024-06-19T09:18:14,091][INFO ][logstash.inputs.googlecloudstorage][main][6278fa388e5b5004f390348cab6962e1c49ff5ef2e012a1436a636cecb12a3c8] Fetching blobs from test-bucket

It says fetching, but I can’t find out if it is being fetched and filtered or just not fetched at all.

@RandomDD What are the file names in the GCS bucket that you’re trying the fetch?

Files names are of this format - 2021-05-10T02:20:08Z where each has json logs indicating logs pertaining to that timestamp.

Also, the same thing works fine when I set up filebeat to read GCS and then logstash to Opensearch. But I want to get ride of the intermediate filebeat if logstash alone can read from GCS. Also, is my assumption of filebeat not being able directly write to opensearch correct when the input is GCS?

@I’ve placed a .txt file in my GCS bucket and then I added file_matches option to the input.

input {
  google_cloud_storage {
    bucket_id => "pw_source_bucket"
    json_key_file => "/usr/share/logstash/config/credentials.json"
    file_matches => ".*txt"
    codec => "json_lines"
   }
}
output {
  opensearch {
      index => "logstash-%{+YYYY.MM.dd}"
      hosts => ["https://docker3.pablo.net:9200"]
      user => admin
      password => Eliatra123
      ssl => true
      ssl_certificate_verification => false
      action => "create"
  }
}

This was my example document in the GCS bucket.

[2024-06-19T09:44:25,350][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-cluster-master-0] Test with .txt
[2024-06-19T09:44:25,350][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-cluster-master-0] File /usr/share/opensearch/config/esnode-key.pem has insecure file permissions (should be 0600)
[2024-06-19T09:44:25,350][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-cluster-master-0] File /usr/share/opensearch/config/securityadmin_demo.sh has insecure file permissions (should be 0600)
[2024-06-19T09:44:25,350][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-cluster-master-0] File /usr/share/opensearch/config/esnode.pem has insecure file permissions (should be 0600)
[2024-06-19T09:44:26,650][INFO ][o.o.p.c.c.PluginSettings ] [opensearch-cluster-master-0] Config: metricsLocation: /dev/shm/performanceanalyzer/, metricsDeletionInterval: 1, httpsEnabled: false, cleanup-metrics-db-files: true, batch-metrics-retention-period-minutes: 7, rpc-port: 9650, webservice-port 9600
[2024-06-19T09:44:27,669][INFO ][o.o.i.r.ReindexPlugin    ] [opensearch-cluster-master-0] ReindexPlugin reloadSPI called
[2024-06-19T09:44:27,670][INFO ][o.o.i.r.ReindexPlugin    ] [opensearch-cluster-master-0] Unable to find any implementation for RemoteReindexExtension
[2024-06-19T09:44:27,724][INFO ][o.o.j.JobSchedulerPlugin ] [opensearch-cluster-master-0] Loaded scheduler extension: opendistro_anomaly_detector, index: .opendistro-anomaly-detector-jobs
[2024-06-19T09:44:27,766][INFO ][o.o.j.JobSchedulerPlugin ] [opensearch-cluster-master-0] Loaded scheduler extension: reports-scheduler, index: .opendistro-reports-definitions
[2024-06-19T09:44:27,768][INFO ][o.o.j.JobSchedulerPlugin ] [opensearch-cluster-master-0] Loaded scheduler extension: opendistro-index-management, index: .opendistro-ism-config
[2024-06-19T09:44:27,769][INFO ][o.o.j.JobSchedulerPlugin ] [opensearch-cluster-master-0] Loaded scheduler extension: scheduler_geospatial_ip2geo_datasource, index: .scheduler-geospatial-ip2geo-datasource
[2024-06-19T09:44:27,772][INFO ][o.o.j.JobSchedulerPlugin ] [opensearch-cluster-master-0] Loaded scheduler extension: opensearch_sap_job, index: .opensearch-sap--job
[2024-06-19T09:44:27,835][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [aggs-matrix-stats]
[2024-06-19T09:44:27,835][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [analysis-common]
[2024-06-19T09:44:27,835][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [cache-common]
[2024-06-19T09:44:27,835][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [geo]
[2024-06-19T09:44:27,835][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [ingest-common]
[2024-06-19T09:44:27,836][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [ingest-geoip]
[2024-06-19T09:44:27,836][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [ingest-user-agent]
[2024-06-19T09:44:27,836][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [lang-expression]
[2024-06-19T09:44:27,836][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [lang-mustache]
[2024-06-19T09:44:27,836][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [lang-painless]
[2024-06-19T09:44:27,836][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [mapper-extras]
[2024-06-19T09:44:27,836][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [opensearch-dashboards]
[2024-06-19T09:44:27,837][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [parent-join]
[2024-06-19T09:44:27,837][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [percolator]
[2024-06-19T09:44:27,837][INFO ][o.o.p.PluginsService     ] [opensearch-cluster-master-0] loaded module [rank-eval]

@pablo Interesting, my files in the bucket have json single line logs and no format associated with it though. I can try importing a file with extension and see if that works along with changing the config similar to yours.

@pablo One question. Do we know if filebeat can directly write to opensearch without the need for logstash? I want to explore that too. But when I tried, it threw an error stating it does not have the required output plugin. So I am assuming that beats can only be used in conjunction with logstash wrt opensearch. Is my understanding right or I am missing something here? And I need beats greater than version 8.5 for supporting GCS input.

@RandomDD As per OS documentation OpenSearch supports Filebeat up to version 7.12.1

Yeah, went through the docs. The supported version does not support GCS. Anyways, I added file_matches => “.*” to the config and it started working. Seems like the default value .*\.log(\.gz)?. So it did not pick up my files. Thanks @pablo

1 Like