Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.17.1
Describe the issue:
I’ve tested the advantages of quantization with two groups of indices:
green open quantized.comment_2021 U7GNCo1qStmGOf7PZksTAg 3 1 132124 60 1.8gb 975.9mb
green open quantized.comment_2022 Dhz8w2PpTBaAMUT_BzeG9w 3 1 101407 0 1.4gb 734.9mb
green open quantized.comment_2023 qfWsD3zuS1Wz16QA4G-xsw 3 1 105771 0 1.7gb 895.3mb
green open comment_2021 D7KJLcnNRa-7UHx52LGyEA 3 1 132124 7243 2.4gb 1.2gb
green open comment_2022 zCBcgeArRuyD-3zCCTlCMw 3 1 101716 1301 1.7gb 894.1mb
green open comment_2023 Ytf7vWPKT6OzR5LtEHoyDw 3 1 105771 21041 2.3gb 1.2gb
The only difference between them is index mapping, whether knn.mode
is on_disk(for quantization) or in_memory(default).
"COMMENT": {
"type": "nested",
"properties": {
"knn": {
"mode": "on_disk",
"space_type": "innerproduct",
"data_type": "float",
"dimension": 768,
"type": "knn_vector"
}
}
}
"COMMENT": {
"type": "nested",
"properties": {
"knn": {
"method": {
"engine": "faiss",
"space_type": "l2",
"name": "hnsw",
"parameters": {
"ef_construction": 128,
"m": 16
}
},
"dimension": 768,
"type": "knn_vector"
}
}
}
At first, I expected that COMMENT.knn
fields would be different but they aren’t. The only significant difference in stored indices,(/usr/share/opensearch/data/nodes/0/indices/D7KJLcnNRa-7UHx52LGyEA/2/index
) is the existence of NativeEngines990KnnVectorsFormat_0.osknnqstate files.
-
With these files, do OpenSearch’s
KNN990QuantizationStateWriter
andKNN990QuantizationStateReader
help data nodes to index/search documents lighter and faster? -
According to Disk-based vector search - OpenSearch Documentation, Disk-based vector search uses binary quantization, compressing vectors and thereby reducing the memory requirements. But disk reduction is only 25%, just same as byte vector(8bit integer quantization).