When indexing ~4000 documents (two vectors per document) with the _bulk update API to 12 different indices in elastic search a small number of indices are entering red state. Cluster resources (CPU and memory) do not seem stressed. Is there any advice that can be provided with regards to indexing vectors into multiple indices on the same cluster concurrently?
UPDATE: This is only occurring when we try writing to indices restored from a snapshot. Everything is fine if we rebuild the indices from scratch.
This forum is for Open Distro only. Do you mind sending me an email bpavani@amazon.com with details and we can get someone from the service team to help you out.
I followed up with @bpavani via email. The exception we’re seeing is a bit different than the one in the specified issue. It’s a merge exception during shard allocation.
[2020-09-02T06:47:07,602][WARN ][o.e.i.c.IndicesClusterStateService] [dfe88a6a13565ff8c940025d7f324a3c] [[catalog_6_c3749655-ff83-48aa-973e-7ab1cad7d51b_fds][0]] marking and sending shard failed due to [shard failure, reason [merge failed]]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.NullPointerException
at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2310) ~[elasticsearch-7.1.1.jar:7.1.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:760) ~[elasticsearch-7.1.1.jar:7.1.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.1.1.jar:7.1.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.NullPointerException AMAZON_INTERNAL
at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:152) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4459) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4054) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:101) ~[elasticsearch-7.1.1.jar:7.1.1]
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
Caused by: java.lang.RuntimeException: java.lang.NullPointerException AMAZON_INTERNAL AMAZON_INTERNAL AMAZON_INTERNAL
at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:152) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4459) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4054) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:101) ~[elasticsearch-7.1.1.jar:7.1.1]
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
Caused by: java.lang.NullPointerException
at org.apache.lucene.index.SegmentDocValuesProducer.getBinary(SegmentDocValuesProducer.java:103) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56] AMAZON_INTERNAL AMAZON_INTERNAL AMAZON_INTERNAL
at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:152) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4459) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4054) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:101) ~[elasticsearch-7.1.1.jar:7.1.1]
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
I followed up with @bpavani via email. The exception we’re seeing is a bit different. It’s a merge exception on shard allocation. Based off the below stack trace I don’t think our issues are related.
[2020-09-02T06:56:58,697][WARN ][o.e.i.c.IndicesClusterStateService] [dfe88a6a13565ff8c940025d7f324a3c] [[SOME_INDEX][0]] marking and sending shard failed due to [shard failure, reason [merge failed]]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.NullPointerException
at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2310) ~[elasticsearch-7.1.1.jar:7.1.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:760) ~[elasticsearch-7.1.1.jar:7.1.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.1.1.jar:7.1.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.NullPointerException AMAZON_INTERNAL
at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:152) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4459) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4054) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:101) ~[elasticsearch-7.1.1.jar:7.1.1]
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
Caused by: java.lang.RuntimeException: java.lang.NullPointerException AMAZON_INTERNAL AMAZON_INTERNAL AMAZON_INTERNAL
at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:152) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4459) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4054) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:101) ~[elasticsearch-7.1.1.jar:7.1.1]
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
Caused by: java.lang.NullPointerException
at org.apache.lucene.index.SegmentDocValuesProducer.getBinary(SegmentDocValuesProducer.java:103) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56] AMAZON_INTERNAL AMAZON_INTERNAL AMAZON_INTERNAL
at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:152) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4459) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4054) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:101) ~[elasticsearch-7.1.1.jar:7.1.1]
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) ~[lucene-core-8.0.0.jar:8.0.0-SNAPSHOT f754f8b0b8588981b899b802b6b5b14806325d78 - akjain - 2020-02-17 14:46:56]
we are considering this as a bug and working on this. My guess is you have 2 vector fields defined in the index and it is possible not both the fields are present in the document.
Work around:-
Have one vector field per index
If planning to stick to more than one vector field, make sure all the vector fields are present in the document
@vamshin We have two vectors fields defined on dynamic index template thus we could have more than two fields (product.{dynamic}.vector_1 & product.{dynamic}.vector_2), but if product.{dynamic}.vector_1 is defined so should product.{dynamic}.vector_2. I can see if the offending indices have vector_1 defined, but not vector_2 or vice versa.
UPDATE:
In the offending indices, the vectors are always both defined (they are retrieved and index in tandem). So we either have documents with both vectors defined or no vectors defined.
I wanted to provide some clarity for discussion participates (and future readers). This seems to be an issue for any existing index not just indices created from snapshots.