We are using AWS OpenSearch for one of our application. We have configured ingest attachment processor for extracting text from .docx files.
Here is our environment setup details,
Note: For DEV/QA, we use same instance with different index name.
Details | DEV/QA | UAT |
---|---|---|
Version | 1.1 | 1.1 |
Service Software Version | R20220223-P6 | R20220928-P1 |
Dedicated Master Node | No | Yes |
No of nodes | 2 | 3 |
Instance Type | m5.large.search | m6g.xlarge.search |
Flow: Base64 of the .docx file is sent to the ingest attachment processor during indexing time, the text is extracted and saved into a field on opensearch.
Issue ( Inconsistent behavior): Specific .docx file is not parsable by OpenSearch Attachment Processor and throwing the below error.
[logstash.outputs.amazonelasticsearch][main][] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>"25220", :_index=>"test_index", :_type=>"_doc", :_routing=>nil, :pipeline=>"attachment_dev"}, #<LogStash::Event:0x904adf8>], :response=>{"index"=>{"_index"=>"test_index", "_type"=>"_doc", "_id"=>"25220", "status"=>400, "error"=>{"type"=>"parse_exception", "reason"=>"Error parsing document in field [form_data]", "caused_by"=>{"type"=>"tika_exception", "reason"=>"TIKA-198: Illegal IOException from org.apache.tika.parser.ParserDecorator$2@298392f1", "caused_by"=>{"type"=>"i_o_exception", "reason"=>"No such file or directory"}}}}}.
I use “on_failure” configuration to handle this. But the behavior is inconsistent. When I try to simulate(_ingest/pipeline/<>/_simulate) the text extraction by providing the base64 in request body,
On DEV/QA - If I hit 5 times continuously, 4 time text extraction is happening properly and rest 1 time text is not getting extracted and on_failure method is executed.
On UAT - If I hit 5 times continuously, all the five times text is not getting extracted and on_failure method is executed.
Ingest attachment configuration and plugin version ( 1.1.0 ) is same between both the environments. Kindly help in resolving this issue.