Versions
opensearch-2.18.0-1.x86_64
opensearch-data-prepper-jdk-2.10.2-linux-x64
Describe the issue:
2 Ingest nodes with data prepper, read a Kafka topic of simple log data (dummy log with a @timestamp).
I run only ingest_1, all is ok, I’m limited a 30K log/sec.
When I add the second ingest node, ingest_2, performance drop down.
I need 2 ingest for disponibility requirement.
Why adding a second ingest prepper node drop down the speed?
How could I fix it?
Configuration:
kafka: 10.10.10.100:9092
cluster: 6 data, 2 master & 2 ingest nodes.
2 ingest nodes:
ingest_1 : 4 vCPU 10Go RAM 10.10.10.1
ingest_2: 2 vCPU 10Go RAM 10.10.10.2
Relevant configuration:
Same configuration for ingest_1 & ingest_2, except workers.
pipelines.yaml
---
kafka-pipeline:
workers: 8 # 4 for ingest_2
delay: "50"
source:
kafka:
bootstrap_servers:
- 10.10.10.100:9092
topics:
- name: pipeline-fluentbit
group_id: opensearch-data-prepper
auto_offset_reset: "earliest"
auto_commit: true
encryption:
type: "none"
insecure: true
processor:
- parse_json:
source: message
delete_source: true
- date:
from_time_received: true
destination: "@timestamp"
sink:
- opensearch:
hosts:
- "https://10.10.10.1:9200"
- "https://10.10.10.2:9200"
index: "pipeline-fluentbit"
username: "admin"
password: ""
insecure: true
batch_size: 5000
flush_interval: "50"
---