Describe the issue:
I want to ingest json documents from an S3 folder. I created a domain and a pipeline in AWS console, and configured the pipeline using the “s3 scan” blueprint. I selected “json” for codec.
The pipeline read each json file first, and then says failed to find any records. As a result, I have 0 documents ingested to the index when I check on the dashboard. Does anyone have a solution?
Configuration:
version: “2”
test-pipeline:
source:
s3:
codec:
json:
compression: “none”
aws:
region: “eu-central-1”
sts_role_arn: “arn:aws:iam::XXXXX”
scan:
buckets:
- bucket:
name: “XXXXX”
filter:
include_prefix:
- “XXXXX”
sink:
- opensearch:
hosts:
- “XXXXX”
aws:
sts_role_arn: “XXXXX”
region: “eu-central-1”
serverless: false
index: “test-index”
dlq:
s3:
bucket: “XXXXX”
region: “XXXXX”
sts_role_arn: “XXXXX”
Relevant Logs or Screenshots:
2024-10-17T09:35:07.390 [s3-source-scan-1] INFO org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Read S3 object: [bucketName=X, key=X.json]
2024-10-17T09:35:07.444 [s3-source-scan-1] WARN org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Failed to find any records in S3 object: s3ObjectReference=[bucketName=X, key=X.json].