Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Just upgraded to OS 2.11.0 from 2.8.0
Describe the issue :
Using the opensearch Python client library and the helpers.bulk function to index documents, however ever since upgrading to OS 2.11.0 nearly ever file I try to index gets the below error:
" RequestError(400, ‘json_parse_exception’, 'Illegal character ((CTRL-CHAR, code XX)): only regular white space (\r, \n, \t) is allowed between tokens"
These are the same files I was able to successfully index using the same ingestion scripts, same additional Python libraries etc in OS 2.8.0. Did something change with OS JSON parser? Or how can I make sure to escape CTRL characters?
Not sure if matters, but using Pandas to chunk the files and bulk upload. Again all libraries, scripts etc have remained unaltered. OS also has no new settings, etc. Only change in environment was upgrade
Configuration :
Relevant Logs or Screenshots :
Dalador
November 3, 2023, 12:54pm
2
Got the same issue.
OS version 2.11.0, opensearch-py==2.3.2
Any workarounds?
pablo
November 3, 2023, 1:06pm
3
@Dalador @jthomas87 Did you have a look at this GitHub issue?
opened 07:53PM - 20 Oct 23 UTC
closed 07:15PM - 26 Oct 23 UTC
bug
**Describe the bug**
I updated OS from 2.9.0 to 2.11.0 and binary information… appeared in the indexes:
```
[2023/10/20 09:53:38] [error] [output:opensearch:opensearch.0] HTTP status=400 URI=/_bulk, response:
{"error":{"root_cause":[{"type":"json_parse_exception","reason":"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r, \\n, \\t) is allowed between tokens\n at [Source: (byte[])\"\\u001F�\\u0008\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000��Vmo�6\\u0010��_a�si�]�1\rs\\u001D%\\u0015�7XJ�n)\\u000C��<.\\u0012����4�\\u007F�Qr\\u001Ags\\u0002\\u000CӇ��{���!�\\u00147t8}\\u0018�\\u0018���t�SQ��7(c\r�)jq��������UT5��\\u0001e\\u0019��L\\u0003YFjL��=����{�\r�\\u000Fw\\u001F��t\\u0017_\\u0000ʇ+��؛L|�0��7݀\\u0016�\"; line: 1, column: 2]"}],"type":"json_parse_exception","reason":"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r, \\n, \\t) is allowed between tokens\n at [Source: (byte[])\"\\u001F�\\u0008\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000��Vmo�6\\u0010��_a�si�]�1\rs\\u001D%\\u0015�7XJ�n)\\u000C��<.\\u0012����4�\\u007F�Qr\\u001Ags\\u0002\\u000CӇ��{���!�\\u00147t8}\\u0018�\\u0018���t�SQ��7(c\r�)jq��������UT5��\\u0001e\\u0019��L\\u0003YFjL��=����{�\r�\\u000Fw\\u001F��t\\u0017_\\u0000ʇ+��؛L|�0��7݀\\u0016�\"; line: 1, column: 2]"},"status":400}
```
We use fluent-bit and it has a compression option: `Compress gzip`. Turning it off solved the problem. However, we can't permanently disable it because we have a lot of traffic and need to reduce its cost.
**Expected behavior**
Compression should work on version 2.11.0.
**Plugins**
```
opensearch-alerting 2.11.0.0
opensearch-anomaly-detection 2.11.0.0
opensearch-asynchronous-search 2.11.0.0
opensearch-cross-cluster-replication 2.11.0.0
opensearch-custom-codecs 2.11.0.0
opensearch-geospatial 2.11.0.0
opensearch-index-management 2.11.0.0
opensearch-job-scheduler 2.11.0.0
opensearch-knn 2.11.0.0
opensearch-ml 2.11.0.0
opensearch-neural-search 2.11.0.0
opensearch-notifications 2.11.0.0
opensearch-notifications-core 2.11.0.0
opensearch-observability 2.11.0.0
opensearch-performance-analyzer 2.11.0.0
opensearch-reports-scheduler 2.11.0.0
opensearch-security 2.11.0.0
opensearch-security-analytics 2.11.0.0
opensearch-sql 2.11.0.0
repository-s3 2.11.0
```
**Host/Environment (please complete the following information):**
- OS: Azure K8s Service (AKS) v.1.26.6 with Ubuntu nodes v.22.04
- Version: 2.11.0