Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch v 2.7.0
Describe the issue:
I recently ingested a large dataset into an OpenSearch cluster. I used AWS’s OpenSearch Ingestion tool (that uses Data Prepper). The source was an S3 bucket with some partitioned JSON data (triggered by SQS messages), the sink was my cluster, and the processors only include “parse_json” and “delete_entries” for removing a key from every object. I’ve used the same pipeline configuration successfully in the past (w OS v2.5).
For some reason, after ingesting, all texts like “van bühl” were saved as “van b�hl”. It seems that all non-ASCII characters like ä or ó or were changed to the unknown unicode symbol � (\uFFFD). This happened to text data in all fields.
Any idea why this may have happened? Help is really appreciated.