Migrating and using kNN indexes from OpenDistro to OpenSearch

I have successfully upgraded the cluster directly from OpenDistro 1.12 to OpenSearch 1.0.0 with Rolling Upgrade. Cluster and indices are green.

All my indices are kNN, and after upgrade I can’t search with them. Right now I have next error:

{'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': '[KNN] Invalid engine type: null'}], 'type': 'search_phase_execution_exception', 'reason': 'all shards failed', 'phase': 'query', 'grouped': True

After researching the documentation ([Approximate search - OpenSearch documentation](Template for kNN indice)), it became clear that OpenSearch has new parameters for kNN indexes. I tried to update mapping for indices and add parameters

"name": "hnsw",
"engine": "nmslib",

So, I’ve sent next PUT-request:

{
  "properties": {
      "my_vector":{
          "type": "knn_vector",
          "dimension": 512,
          "method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "nmslib",
            "parameters": {
              "ef_construction": 1024,
              "m": 16
            }
          }
      }
  }
}

and get next answer:

{
    "error": {
        "root_cause": [
            {
                "type": "null_pointer_exception",
                "reason": "Cannot invoke \"org.opensearch.knn.index.KNNMethodContext.getMethodComponent()\" because \"m\" is null"
            }
        ],
        "type": "null_pointer_exception",
        "reason": "Cannot invoke \"org.opensearch.knn.index.KNNMethodContext.getMethodComponent()\" because \"m\" is null"
    },
    "status": 500
}

but if i change “my_vector” to “my_vector3”, everything wil be OK:

{
  "properties": {
      "my_vector3":{
          "type": "knn_vector",
          "dimension": 512,
          "method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "nmslib",
            "parameters": {
              "ef_construction": 1024,
              "m": 16
            }
          }
      }
  }
}
{
    "acknowledged": true
}

Here is my setup for indece:

{
  "index_data_m16_ef1024": {
    "aliases": {},
    "mappings": {
      "_source": {
        "excludes": [
          "my_vector"
        ]
      },
      "properties": {
        "cam_id": {
          "type": "integer"
        },
        "detect_id": {
          "type": "long"
        },
        "my_vector": {
          "type": "knn_vector",
          "dimension": 512
        },
        "my_vector2": {
          "type": "knn_vector",
          "dimension": 512,
          "method": {
            "engine": "nmslib",
            "space_type": "cosinesimil",
            "name": "hnsw",
            "parameters": {
              "ef_construction": 1024,
              "m": 16
            }
          }
        },
        "time_check": {
          "type": "date",
          "store": true,
          "format": "yyyy-MM-dd HH:mm:ss"
        }
      }
    },
    "settings": {
      "index": {
        "refresh_interval": "-1",
        "translog": {
          "flush_threshold_size": "10gb"
        },
        "knn.algo_param": {
          "ef_search": "1024",
          "ef_construction": "1024",
          "m": "16"
        },
        "blocks": {
          "write": "true"
        },
        "provided_name": "index_data_m16_ef1024",
        "max_result_window": "1000000",
        "knn": "true",
        "creation_date": "1678200300301",
        "unassigned": {
          "node_left": {
            "delayed_timeout": "240m"
          }
        },
        "number_of_replicas": "1",
        "uuid": "ir0INjAxSGKv_1Gd7VEiQA",
        "version": {
          "created": "7100099",
          "upgraded": "7100099"
        },
        "routing": {
          "allocation": {
            "initial_recovery": {
              "_id": null
            }
          }
        },
        "number_of_shards": "15",
        "routing_partition_size": "1",
        "knn.space_type": "cosinesimil",
        "resize": {
          "source": {
            "name": "index_data_m16_ef1024",
            "uuid": "Bz3uUXjUQU60erWW4SgoMg"
          }
        }
      }
    }
  }
}

What shall I do in this situation?

P.S: I have TB of indices, so there is no way to create new cluster or reindex indices

I’ve asked K-NN team to take a look cc: @vamshin

Hi @KolesnikovYA

Right, this is a a bug: [BUG] Upgrade from ES 7.x to OS 1.x leads to query failure · Issue #255 · opensearch-project/k-NN · GitHub. More details in the issue. The mitigation for it is to reindex.

Jack

Hi @jmazane
Thanks for information. The situation has cleaned up.
Am I to understand - if I delete vector’s data (in order to free up space), there is no point in reindexing?

@KolesnikovYA When you say delete vector’s data, do you mean delete the indices with vectors or expunge_deleted_documents from the vectors?

@jmazane
The situation is as follows:
I have kNN-index data_index_m16_ef1024 from OpenDistro (that’s why I can’t use it for searches in OpenSearch) with next settings and mappings

{
    "mappings": {
      "_source": {
        "excludes": [
          "my_vector"
        ]
      },
      "properties": {
        "cam_id": {
          "type": "integer"
        },
        "detect_id": {
          "type": "long"
        },
        "my_vector": {
          "type": "knn_vector",
          "dimension": 512
        },
        "time_check": {
          "type": "date",
          "store": true,
          "format": "yyyy-MM-dd HH:mm:ss"
        }
      }
    },
    "settings": {
      "index": {
        "refresh_interval": "-1",
        "translog": {
          "flush_threshold_size": "10gb"
        },
        "knn.algo_param": {
          "ef_search": "1024",
          "ef_construction": "1024",
          "m": "16"
        },
        "blocks": {
          "write": "false"
        },
        "provided_name": "data_index_m16_ef1024",
        "max_result_window": "1000000",
        "knn": "true",
        "creation_date": "1677495524663",
        "unassigned": {
          "node_left": {
            "delayed_timeout": "240m"
          }
        },
        "number_of_replicas": "0",
        "uuid": "Bz3uUXjUQU60erWW4SgoMg",
        "version": {
          "created": "7100099",
          "upgraded": "135217827"
        },
        "number_of_shards": "15",
        "knn.space_type": "cosinesimil"
      }
    }
  }
}

I’ve created empty index data_index_m16_ef1024_test_reindex with identical settings and mappings:


{
    "mappings": {
      "_source": {
        "excludes": [
          "my_vector"
        ]
      },
      "properties": {
        "cam_id": {
          "type": "integer"
        },
        "detect_id": {
          "type": "long"
        },
        "my_vector": {
          "type": "knn_vector",
          "dimension": 512
        },
        "time_check": {
          "type": "date",
          "store": true,
          "format": "yyyy-MM-dd HH:mm:ss"
        }
      }
    },
    "settings": {
      "index": {
        "refresh_interval": "-1",
        "translog": {
          "flush_threshold_size": "10gb"
        },
        "knn.algo_param": {
          "ef_search": "1024",
          "ef_construction": "1024",
          "m": "16"
        },
        "blocks": {
          "write": "false"
        },
        "provided_name": "data_index_m16_ef1024_test_reindex",
        "max_result_window": "1000000",
        "knn": "true",
        "creation_date": "1678864412387",
        "unassigned": {
          "node_left": {
            "delayed_timeout": "240m"
          }
        },
        "number_of_replicas": "0",
        "uuid": "LoodKjeuTGCgYJJUWp3xvQ",
        "version": {
          "created": "135217827"
        },
        "number_of_shards": "15",
        "knn.space_type": "cosinesimil"
      }
    }
  }
}

Than I do POST-request for reindex:

{
   "source":{
      "index":"data_index_m16_ef1024"
   },
   "dest":{
      "index":"data_index_m16_ef1024_test_reindex"
   }
}

and get next response:

{
    "took": 221557,
    "timed_out": false,
    "total": 9214659,
    "updated": 0,
    "created": 9214659,
    "deleted": 0,
    "batches": 9215,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
        "bulk": 0,
        "search": 0
    },
    "throttled_millis": 0,
    "requests_per_second": -1.0,
    "throttled_until_millis": 0,
    "failures": []
}

The size of data_index_m16_ef1024 and number of documents:

pri.store.size: 43.8gb
docs.count: 9214659

The size of data_index_m16_ef1024_test_reindex and number of documents:

pri.store.size: 592.2mb
docs.count: 9214659

So, after reindexing the size of data_index_m16_ef1024_test_reindex is much less than data_index_m16_ef1024 , and search on data_index_m16_ef1024 doen’t work.

Here is my cluster settings:


{
  "persistent": {
    "action": {
      "destructive_requires_name": "true"
    },
    "cluster": {
      "routing": {
        "rebalance": {
          "enable": "none"
        },
        "allocation": {
          "allow_rebalance": "indices_all_active",
          "cluster_concurrent_rebalance": "15",
          "node_concurrent_recoveries": "2",
          "disk": {
            "threshold_enabled": "true",
            "watermark": {
              "low": "85%",
              "high": "90%"
            }
          },
          "enable": "all",
          "node_concurrent_outgoing_recoveries": "2"
        }
      },
      "metadata": {
        "perf_analyzer": {
          "state": "0"
        }
      }
    },
    "knn": {
      "algo_param": {
        "index_thread_qty": "8"
      },
      "cache": {
        "item": {
          "expiry": {
            "enabled": "false",
            "minutes": "1m"
          }
        }
      },
      "circuit_breaker": {
        "triggered": "false"
      },
      "memory": {
        "circuit_breaker": {
          "limit": "80%",
          "enabled": "true"
        }
      }
    }
  },
  "transient": {}
}

and settings from opensearch.yml:

cluster.routing.allocation.disk.threshold_enabled: false
node.max_local_storage_nodes: 3
thread_pool.search.size: 100

I undestand that something goes wrong, but can’t find the mistake

I see. Are there any circuit breakers getting triggered? Is anything showing up in the logs?

Also, did you confirm the reindex isnt happening in the background? Check with GET /_cat/tasks.

Lastly, when you get one of the docs on new cluster, is it empty?

GET /data_index_m16_ef1024_test_reindex/_search?pretty

  1. No, there is no circuit breakers getting triggered. In logs no errors
  2. Yes, I confirm the reindex is happening
  3. Here is the answer of GET /data_index_m16_ef1024_test_reindex/_search?pretty
{
  "took" : 46,
  "timed_out" : false,
  "_shards" : {
    "total" : 15,
    "successful" : 15,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

From the above, it seems that when you checked, reindexing was still going on. With those parameters, it may take quite some time (1024 ef).

Did the reindexing ever complete? (i.e. tasks no longer say reindexing going on)?