The current behavior of knn queries is that they will first check if there are unloaded native library indices and load them to RAM before the query is performed. OpenSearch provides the warmup endpoint (
/_plugins/_knn/warmup/index1,index2,index3) which makes it easier to find strategies that reduce the initial query latency.
I’m wondering if there are any best practices for implementing this endpoint in kubernetes. Our current scenario is as follows:
We have an OpenSearch cluster with knn indices whose shards live in different nodes. Some times a node may go down, but by having multiple replicated shards, this results in no reduced availability. The issue appears when the node comes back online. What we see is the following:
- A node containing a replica shard of a knn index goes down. All knn queries on this index execute without any extra latency or downtime, because all native library indices are already loaded
- The node that was down comes back online, and now any knn query to the index will first try to load the native library indices of the node that just connected.
We would like to find a strategy for removing the latency caused by the node coming back online. One idea is to prevent the newly connected node from receiving any traffic from the masters, until it has warmed up. Though I don’t know if this is technically possible; the warmup endpoint works on all shards, but is it possible for it to run on a shard that is not yet available? or what would be a way of letting the warmup endpoint run on the new shard but preventing the standard knn queries from trying to load the native library indices of the shard?