Integer Overflow occurs when building a large-scale vector index

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

OpenSearch 3.3.2

Describe the issue:

I built a vector index (faiss engine, HNSW + BQ, 1024 dimension per vector) with memory_optimized_search enabled (totally over 10 billion vectors) and the index setting is shown as follows:

"settings": {
   "index": {
       "knn": true,
       "knn.memory_optimized_search": true,
       "replication.type": "SEGMENT",
       "number_of_shards": 232,
        "number_of_replicas": 0
    }
}

After inserting all the data, I performed a force_merge to reduce segments to 1 per shard (approx. 43 million documents per shard). After that, when I searched the index with a query vector and an error “integer overflow” occurred

The full stack looks like:

Caused by: org.opensearch.core.common.io.stream.NotSerializableExceptionWrapper: arithmetic_exception: integer overflow
        at java.lang.Math.toIntExact(Math.java:1374) ~[?:?]
        at org.opensearch.knn.memoryoptsearch.faiss.MonotonicIntegerSequenceEncoder.encode(MonotonicIntegerSequenceEncoder.java:63) ~[?:?]
        at org.opensearch.knn.memoryoptsearch.faiss.FaissHNSW.load(FaissHNSW.java:77) ~[?:?]
        at org.opensearch.knn.memoryoptsearch.faiss.binary.FaissBinaryHnswIndex.doLoad(FaissBinaryHnswIndex.java:51) ~[?:?]
        at org.opensearch.knn.memoryoptsearch.faiss.FaissIndex.load(FaissIndex.java:55) ~[?:?]
        at org.opensearch.knn.memoryoptsearch.faiss.FaissIdMapIndex.doLoad(FaissIdMapIndex.java:58) ~[?:?]
        at org.opensearch.knn.memoryoptsearch.faiss.FaissIndex.load(FaissIndex.java:55) ~[?:?]
        at org.opensearch.knn.memoryoptsearch.faiss.FaissMemoryOptimizedSearcher.<init>(FaissMemoryOptimizedSearcher.java:49) ~[?:?]
        at org.opensearch.knn.memoryoptsearch.faiss.FaissMemoryOptimizedSearcherFactory.createVectorSearcher(FaissMemoryOptimizedSearcherFactory.java:39) ~[?:?]
        at org.opensearch.knn.index.codec.KNN990Codec.NativeEngines990KnnVectorsReader.lambda$getVectorSearcherSupplier$0(NativeEngines990KnnVectorsReader.java:372) ~[?:?]
        at org.opensearch.knn.index.codec.KNN990Codec.NativeEngines990KnnVectorsReader.loadMemoryOptimizedSearcherIfRequired(NativeEngines990KnnVectorsReader.java:325) ~[?:?]
        at org.opensearch.knn.index.codec.KNN990Codec.NativeEngines990KnnVectorsReader.trySearchWithMemoryOptimizedSearch(NativeEngines990KnnVectorsReader.java:252) ~[?:?]
        at org.opensearch.knn.index.codec.KNN990Codec.NativeEngines990KnnVectorsReader.search(NativeEngines990KnnVectorsReader.java:196) ~[?:?]
        at org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader.search(PerFieldKnnVectorsFormat.java:324) ~[lucene-core-10.3.1.jar:10.3.1 51190f35a16d2ce433139abfe0fd8365791b352a - 2025-10-02 09:50:16]
        at org.opensearch.knn.index.query.memoryoptsearch.MemoryOptimizedKNNWeight.queryIndex(MemoryOptimizedKNNWeight.java:202) ~[?:?]
        at org.opensearch.knn.index.query.memoryoptsearch.MemoryOptimizedKNNWeight.doANNSearch(MemoryOptimizedKNNWeight.java:100) ~[?:?]
        at org.opensearch.knn.index.query.KNNWeight.approximateSearch(KNNWeight.java:505) ~[?:?]
        at org.opensearch.knn.index.query.KNNWeight.searchLeaf(KNNWeight.java:336) ~[?:?]
        at org.opensearch.knn.index.query.nativelib.NativeEngineKnnVectorQuery.searchLeaf(NativeEngineKnnVectorQuery.java:438) ~[?:?]
        at org.opensearch.knn.index.query.nativelib.NativeEngineKnnVectorQuery.lambda$doSearch$0(NativeEngineKnnVectorQuery.java:272) ~[?:?]

The main issue is caused by MonotonicIntegerSequenceEncoder.encode function:
final long value = Math.toIntExact(input.readLong());

Is there any solution to this problem? It seems that it’s a bug when indexing large amounts of vectors with memory_optimized_search enabled.

After analyzing the code, I guess an individual FAISS HNSW segment cannot exceed Integer.MAX_VALUE vectors, otherwise, the error mentioned above will occur.

However, a segment only has 43 million vectors in my experiment. Based on the code, it should not produce this integer overflow error. Please correct me if I’m wrong.

Hi @Maihj
this seems to be a bug in the code with Memory Optimized Search. Let me create a gh issue for this and we can continue the discussion on the gh issue.

GH issue: [BUG] Integer Overflow occurs when searching a large-scale vector index · Issue #3108 · opensearch-project/k-NN · GitHub

1 Like

thanks @Navneet