Migrating and using kNN indexes from OpenDistro to OpenSearch

KolesnikovYA · March 7, 2023, 9:35pm

I have successfully upgraded the cluster directly from OpenDistro 1.12 to OpenSearch 1.0.0 with Rolling Upgrade. Cluster and indices are green.

All my indices are kNN, and after upgrade I can’t search with them. Right now I have next error:

{'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': '[KNN] Invalid engine type: null'}], 'type': 'search_phase_execution_exception', 'reason': 'all shards failed', 'phase': 'query', 'grouped': True

After researching the documentation ([Approximate search - OpenSearch documentation](Template for kNN indice)), it became clear that OpenSearch has new parameters for kNN indexes. I tried to update mapping for indices and add parameters

"name": "hnsw",
"engine": "nmslib",

So, I’ve sent next PUT-request:

{
  "properties": {
      "my_vector":{
          "type": "knn_vector",
          "dimension": 512,
          "method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "nmslib",
            "parameters": {
              "ef_construction": 1024,
              "m": 16
            }
          }
      }
  }
}

and get next answer:

{
    "error": {
        "root_cause": [
            {
                "type": "null_pointer_exception",
                "reason": "Cannot invoke \"org.opensearch.knn.index.KNNMethodContext.getMethodComponent()\" because \"m\" is null"
            }
        ],
        "type": "null_pointer_exception",
        "reason": "Cannot invoke \"org.opensearch.knn.index.KNNMethodContext.getMethodComponent()\" because \"m\" is null"
    },
    "status": 500
}

but if i change “my_vector” to “my_vector3”, everything wil be OK:

{
  "properties": {
      "my_vector3":{
          "type": "knn_vector",
          "dimension": 512,
          "method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "nmslib",
            "parameters": {
              "ef_construction": 1024,
              "m": 16
            }
          }
      }
  }
}

{
    "acknowledged": true
}

Here is my setup for indece:

{
  "index_data_m16_ef1024": {
    "aliases": {},
    "mappings": {
      "_source": {
        "excludes": [
          "my_vector"
        ]
      },
      "properties": {
        "cam_id": {
          "type": "integer"
        },
        "detect_id": {
          "type": "long"
        },
        "my_vector": {
          "type": "knn_vector",
          "dimension": 512
        },
        "my_vector2": {
          "type": "knn_vector",
          "dimension": 512,
          "method": {
            "engine": "nmslib",
            "space_type": "cosinesimil",
            "name": "hnsw",
            "parameters": {
              "ef_construction": 1024,
              "m": 16
            }
          }
        },
        "time_check": {
          "type": "date",
          "store": true,
          "format": "yyyy-MM-dd HH:mm:ss"
        }
      }
    },
    "settings": {
      "index": {
        "refresh_interval": "-1",
        "translog": {
          "flush_threshold_size": "10gb"
        },
        "knn.algo_param": {
          "ef_search": "1024",
          "ef_construction": "1024",
          "m": "16"
        },
        "blocks": {
          "write": "true"
        },
        "provided_name": "index_data_m16_ef1024",
        "max_result_window": "1000000",
        "knn": "true",
        "creation_date": "1678200300301",
        "unassigned": {
          "node_left": {
            "delayed_timeout": "240m"
          }
        },
        "number_of_replicas": "1",
        "uuid": "ir0INjAxSGKv_1Gd7VEiQA",
        "version": {
          "created": "7100099",
          "upgraded": "7100099"
        },
        "routing": {
          "allocation": {
            "initial_recovery": {
              "_id": null
            }
          }
        },
        "number_of_shards": "15",
        "routing_partition_size": "1",
        "knn.space_type": "cosinesimil",
        "resize": {
          "source": {
            "name": "index_data_m16_ef1024",
            "uuid": "Bz3uUXjUQU60erWW4SgoMg"
          }
        }
      }
    }
  }
}

What shall I do in this situation?

P.S: I have TB of indices, so there is no way to create new cluster or reindex indices

vemsarat · March 13, 2023, 8:09pm

I’ve asked K-NN team to take a look cc: @vamshin

jmazane · March 13, 2023, 10:37pm

Hi @KolesnikovYA

Right, this is a a bug: [BUG] Upgrade from ES 7.x to OS 1.x leads to query failure · Issue #255 · opensearch-project/k-NN · GitHub. More details in the issue. The mitigation for it is to reindex.

Jack

KolesnikovYA · March 14, 2023, 1:49pm

Hi @jmazane
Thanks for information. The situation has cleaned up.
Am I to understand - if I delete vector’s data (in order to free up space), there is no point in reindexing?

jmazane · March 14, 2023, 4:03pm

@KolesnikovYA When you say delete vector’s data, do you mean delete the indices with vectors or expunge_deleted_documents from the vectors?

KolesnikovYA · March 15, 2023, 8:05am

@jmazane
The situation is as follows:
I have kNN-index data_index_m16_ef1024 from OpenDistro (that’s why I can’t use it for searches in OpenSearch) with next settings and mappings

{
    "mappings": {
      "_source": {
        "excludes": [
          "my_vector"
        ]
      },
      "properties": {
        "cam_id": {
          "type": "integer"
        },
        "detect_id": {
          "type": "long"
        },
        "my_vector": {
          "type": "knn_vector",
          "dimension": 512
        },
        "time_check": {
          "type": "date",
          "store": true,
          "format": "yyyy-MM-dd HH:mm:ss"
        }
      }
    },
    "settings": {
      "index": {
        "refresh_interval": "-1",
        "translog": {
          "flush_threshold_size": "10gb"
        },
        "knn.algo_param": {
          "ef_search": "1024",
          "ef_construction": "1024",
          "m": "16"
        },
        "blocks": {
          "write": "false"
        },
        "provided_name": "data_index_m16_ef1024",
        "max_result_window": "1000000",
        "knn": "true",
        "creation_date": "1677495524663",
        "unassigned": {
          "node_left": {
            "delayed_timeout": "240m"
          }
        },
        "number_of_replicas": "0",
        "uuid": "Bz3uUXjUQU60erWW4SgoMg",
        "version": {
          "created": "7100099",
          "upgraded": "135217827"
        },
        "number_of_shards": "15",
        "knn.space_type": "cosinesimil"
      }
    }
  }
}

I’ve created empty index data_index_m16_ef1024_test_reindex with identical settings and mappings:


{
    "mappings": {
      "_source": {
        "excludes": [
          "my_vector"
        ]
      },
      "properties": {
        "cam_id": {
          "type": "integer"
        },
        "detect_id": {
          "type": "long"
        },
        "my_vector": {
          "type": "knn_vector",
          "dimension": 512
        },
        "time_check": {
          "type": "date",
          "store": true,
          "format": "yyyy-MM-dd HH:mm:ss"
        }
      }
    },
    "settings": {
      "index": {
        "refresh_interval": "-1",
        "translog": {
          "flush_threshold_size": "10gb"
        },
        "knn.algo_param": {
          "ef_search": "1024",
          "ef_construction": "1024",
          "m": "16"
        },
        "blocks": {
          "write": "false"
        },
        "provided_name": "data_index_m16_ef1024_test_reindex",
        "max_result_window": "1000000",
        "knn": "true",
        "creation_date": "1678864412387",
        "unassigned": {
          "node_left": {
            "delayed_timeout": "240m"
          }
        },
        "number_of_replicas": "0",
        "uuid": "LoodKjeuTGCgYJJUWp3xvQ",
        "version": {
          "created": "135217827"
        },
        "number_of_shards": "15",
        "knn.space_type": "cosinesimil"
      }
    }
  }
}

Than I do POST-request for reindex:

{
   "source":{
      "index":"data_index_m16_ef1024"
   },
   "dest":{
      "index":"data_index_m16_ef1024_test_reindex"
   }
}

and get next response:

{
    "took": 221557,
    "timed_out": false,
    "total": 9214659,
    "updated": 0,
    "created": 9214659,
    "deleted": 0,
    "batches": 9215,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
        "bulk": 0,
        "search": 0
    },
    "throttled_millis": 0,
    "requests_per_second": -1.0,
    "throttled_until_millis": 0,
    "failures": []
}

The size of data_index_m16_ef1024 and number of documents:

pri.store.size: 43.8gb
docs.count: 9214659

The size of data_index_m16_ef1024_test_reindex and number of documents:

pri.store.size: 592.2mb
docs.count: 9214659

So, after reindexing the size of data_index_m16_ef1024_test_reindex is much less than data_index_m16_ef1024 , and search on data_index_m16_ef1024 doen’t work.

Here is my cluster settings:


{
  "persistent": {
    "action": {
      "destructive_requires_name": "true"
    },
    "cluster": {
      "routing": {
        "rebalance": {
          "enable": "none"
        },
        "allocation": {
          "allow_rebalance": "indices_all_active",
          "cluster_concurrent_rebalance": "15",
          "node_concurrent_recoveries": "2",
          "disk": {
            "threshold_enabled": "true",
            "watermark": {
              "low": "85%",
              "high": "90%"
            }
          },
          "enable": "all",
          "node_concurrent_outgoing_recoveries": "2"
        }
      },
      "metadata": {
        "perf_analyzer": {
          "state": "0"
        }
      }
    },
    "knn": {
      "algo_param": {
        "index_thread_qty": "8"
      },
      "cache": {
        "item": {
          "expiry": {
            "enabled": "false",
            "minutes": "1m"
          }
        }
      },
      "circuit_breaker": {
        "triggered": "false"
      },
      "memory": {
        "circuit_breaker": {
          "limit": "80%",
          "enabled": "true"
        }
      }
    }
  },
  "transient": {}
}

and settings from opensearch.yml:

cluster.routing.allocation.disk.threshold_enabled: false
node.max_local_storage_nodes: 3
thread_pool.search.size: 100

I undestand that something goes wrong, but can’t find the mistake

jmazane · March 16, 2023, 11:06pm

I see. Are there any circuit breakers getting triggered? Is anything showing up in the logs?

Also, did you confirm the reindex isnt happening in the background? Check with GET /_cat/tasks.

Lastly, when you get one of the docs on new cluster, is it empty?

GET /data_index_m16_ef1024_test_reindex/_search?pretty

KolesnikovYA · March 17, 2023, 1:10pm

No, there is no circuit breakers getting triggered. In logs no errors
Yes, I confirm the reindex is happening
Here is the answer of GET /data_index_m16_ef1024_test_reindex/_search?pretty

{
  "took" : 46,
  "timed_out" : false,
  "_shards" : {
    "total" : 15,
    "successful" : 15,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

jmazane · March 27, 2023, 10:22pm

From the above, it seems that when you checked, reindexing was still going on. With those parameters, it may take quite some time (1024 ef).

Did the reindexing ever complete? (i.e. tasks no longer say reindexing going on)?

Topic		Replies	Views
Upgrade to opensearch 2.17.1 blocked because index.knn is set to false k-NN	4	702	January 18, 2025
Knn search is too slow OpenSearch troubleshoot	1	936	June 29, 2023
Poor recall in ODFE1.8 k-NN	10	648	October 16, 2020
Reindexing Produces Different Result On The Same Query Vector k-NN	9	1424	May 12, 2021
Is There a GitHub Issue Tracking the "Upgrade to opensearch 2.17.1 blocked because index.knn is set to false"? k-NN	4	31	May 24, 2025

Migrating and using kNN indexes from OpenDistro to OpenSearch

Related topics