@Frey0-0 Looking at the code, it seems that setting filtered_exact_search_threshold: -1 only disables one of three paths to exact search.
Looking at KNNWeight.java:
Path 1: Threshold-based (what -1 disables).
private boolean isFilteredExactSearchPreferred(final int filterIdsCount) {
if (filterWeight == null) return false;
int filterThresholdValue = KNNSettings.getFilteredExactSearchThreshold(...);
if (isFilterIdCountLessThanK(filterIdsCount)) return true; // still active
if (isExactSearchThresholdSettingSet(filterThresholdValue)) { ... } // -1 disables this
return isMaxDistCompGreaterThanEstimatedDistComp(filterIdsCount); // still active
}
Path 2: Filter count ≤ K (always active, ignores threshold).
Setting threshold to -1 does not disable this. If the has_parent filter matches fewer docs than k, exact search fires unconditionally.
Path 3: ANN returns < K results (post-ANN fallback).
private boolean isExactSearchRequire(...) {
if (isFilteredExactSearchRequireAfterANNSearch(filterIdsCount, annResultCount)) {
if (KNNSettings.isKnnIndexFaissEfficientFilterExactSearchDisabled(...)) return false;
return true; // fallback still happens
}
}
To fully block all fallback paths, you would need both:
PUT /my-index/_settings
{
"index.knn.advanced.filtered_exact_search_threshold": -1,
"index.knn.faiss.efficient_filter.disable_exact_search": true
}
Even then, Path 2 (filter count ≤ k) does not appear to be skipped as it’s hardcoded.
Regarding nested field exact search issue, this appears to still be present in v3.1.
The nested expansion step always runs exact search. When expandNestedDocs is true (which it is for any nested field query), NativeEngineKnnVectorQuery always performs an exact search pass after ANN to retrieve all sibling documents of the top-K winners. This has nothing to do with compression or rescore settings, it is unconditional for nested queries. This was true before v3.1 and remains true in v3.1.
In v3.1, rescore adds a second exact search on top of that. If your index uses compression (4x or higher), a rescore phase now runs between the ANN search and the nested expansion. This rescore phase also calls getAllSiblings() when a parentsFilter is present, which expands to sibling docs and exact-searches them too.
So in v3.1 the execution for a nested KNN query with compression is:
- ANN search with an upsampled K (3x–5x your actual K)
- Rescore: expand to all sibling docs → exact search (new in v3.1)
- Nested expansion: expand to all sibling docs → exact search again (was already there)
Therefore setting "rescore": false in the query eliminates step 2 but step 3 always runs for nested fields. There is no setting that disables the nested expansion exact search, as it is a fundamental part of how parent-child KNN works, because the engine has no other way to score all the child documents belonging to each matched parent.
Q. Do native engine changes between v2.19 and v3.1 explain the shard latency variance:
In v2.19, every shard did the same amount of work. ANN search is fast and consistent, it traverses the HNSW graph and returns K results regardless of how many total documents are in the shard.
In v3.1, the amount of work per shard now depends on how many documents match your filter. Because of the new rescore + nested expansion exact search, each shard has to compare your query vector against every document that passes the filter, not just K documents. If one shard has 500 matching docs and another has 50,000 matching docs, the second shard does 100× more work.
Your shards don’t have equal data distribution, some shards likely hold more documents for certain tenants or object types (given the has_parent + tenant filter in the query). In v2.19 this didn’t matter because ANN ignored it. In v3.1 it matters a lot because exact search is linear, the slow shards are simply the ones with more data matching your filter.
Hope this helps