Poor recall in ODFE1.8

I’m using KNN with cosineSimul metric. When I index documents, some documents do not appear to be indexed properly. Even searching on the exact vector does not return them. I tried /_update_by_query, and each time it appears to give “failure” for some arbitrary number of documents (different each time). After enough _update_by_query calls on the index, eventually the document I’m looking for will appear (but I assume others are not indexed then).

I have not encountered this issue in ODFE1.10.1, but seeing as 1.8 is the latest version available on managed AWS, I need this working there too.

Any idea what is causing this and how to address?

Hi @timforr

Looking into this. Could you provide the commands you use in your workflow to reach this bug so we can reproduce?

It may be related to KNN graphs occasionally are indexed with wrong spaceType · Issue #239 · opendistro-for-elasticsearch/k-NN · GitHub.

It may be related to Bad recall from ODFE1.8 · Issue #154 · opendistro-for-elasticsearch/k-NN · GitHub also, which was fixed in 1.9.0 but not in 1.8.0.

What I did was create a new index, enable knn and set space type to cosineSimul. My document mappings had two knn fields of 100 dimensions each in addition to some text fields.

I index documents one by one and then send a knn query (which fails to have good recall, some aren’t returned even on the exact same query vector as they have stored).

Then tried “_update_by_query” and look at “_stats” after it finishes. /_stats reports that some of the documents in the index are deleted now (the amount deleted is random and varies a lot every time). I then perform a query and recall is still poor. If I call /_stats again after that query, the deleted number will have decreased, but still be a positive number (e.g. 1k out of 18k documents). If I perform subsequent queries, the results and /_stats remain constant (and results are still poor recall with some documents just not being returned).

Got it. Yes, I think that may be related to Bad recall from ODFE1.8 · Issue #154 · opendistro-for-elasticsearch/k-NN · GitHub. I just patched it in the opendistro-1.8 branch: recall bug fix for odfe>=1.8 by jmazanec15 · Pull Request #246 · opendistro-for-elasticsearch/k-NN · GitHub.

To confirm that fixes the issue, you could build the plugin from source and install it on your cluster. What distribution type are using for your cluster (rpm, deb, tar, Docker)? I can provide instructions for building from source for the distribution you are using.

That is great, thank you for your quick attention. I am using the 1.8 version of ES that is managed by AWS. Is there a way I can deploy the patch there?

Edit: Ah, I see the response here Bad recall from ODFE1.8 · Issue #154 · opendistro-for-elasticsearch/k-NN · GitHub indicating I should be all set. I will do some testing this week and reply on this thread if I’m still seeing issues. Thanks!

Sounds good thanks @timforr

I’m actually still seeing this issue even in 1.10.1. When I call /_stats on my index, it lists the full count (~18k) but also lists ~2k as deleted. Certain documents are never showing up in KNN, even when their exact vector is used to query for them. I assume these are the “deleted” ones. This has been a consistent issue across multiple indices and multiple instances of ES, so I wonder if it could be replicated easily on your end?

Here are the results from 1) _stats, 2) _update_by_query, 3) _stats. Note the weirdness where the number of “deleted” documents changes after an _update_by_query.

1:

curl -X GET "localhost:9205/research-index/_stats?pretty"
{
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "docs" : {
        "count" : 18632,
        "deleted" : 2180
      },
      "store" : {
        "size_in_bytes" : 359510890,
        "reserved_in_bytes" : 0
      },
      "indexing" : {
        "index_total" : 0,
        "index_time_in_millis" : 0,
        "index_current" : 0,
        "index_failed" : 0,
        "delete_total" : 0,
        "delete_time_in_millis" : 0,
        "delete_current" : 0,
        "noop_update_total" : 0,
        "is_throttled" : false,
        "throttle_time_in_millis" : 0
      },
      "get" : {
        "total" : 0,
        "time_in_millis" : 0,
        "exists_total" : 0,
        "exists_time_in_millis" : 0,
        "missing_total" : 0,
        "missing_time_in_millis" : 0,
        "current" : 0
      },
      "search" : {
        "open_contexts" : 0,
        "query_total" : 26,
        "query_time_in_millis" : 321,
        "query_current" : 0,
        "fetch_total" : 26,
        "fetch_time_in_millis" : 225,
        "fetch_current" : 0,
        "scroll_total" : 0,
        "scroll_time_in_millis" : 0,
        "scroll_current" : 0,
        "suggest_total" : 0,
        "suggest_time_in_millis" : 0,
        "suggest_current" : 0
      },
      "merges" : {
        "current" : 0,
        "current_docs" : 0,
        "current_size_in_bytes" : 0,
        "total" : 0,
        "total_time_in_millis" : 0,
        "total_docs" : 0,
        "total_size_in_bytes" : 0,
        "total_stopped_time_in_millis" : 0,
        "total_throttled_time_in_millis" : 0,
        "total_auto_throttle_in_bytes" : 20971520
      },
      "refresh" : {
        "total" : 2,
        "total_time_in_millis" : 0,
        "external_total" : 2,
        "external_total_time_in_millis" : 1,
        "listeners" : 0
      },
      "flush" : {
        "total" : 1,
        "periodic" : 0,
        "total_time_in_millis" : 0
      },
      "warmer" : {
        "current" : 0,
        "total" : 1,
        "total_time_in_millis" : 0
      },
      "query_cache" : {
        "memory_size_in_bytes" : 2076,
        "total_count" : 40,
        "hit_count" : 5,
        "miss_count" : 35,
        "cache_size" : 1,
        "cache_count" : 1,
        "evictions" : 0
      },
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0
      },
      "completion" : {
        "size_in_bytes" : 0
      },
      "segments" : {
        "count" : 2,
        "memory_in_bytes" : 37728,
        "terms_memory_in_bytes" : 25216,
        "stored_fields_memory_in_bytes" : 1968,
        "term_vectors_memory_in_bytes" : 0,
        "norms_memory_in_bytes" : 3840,
        "points_memory_in_bytes" : 0,
        "doc_values_memory_in_bytes" : 6704,
        "index_writer_memory_in_bytes" : 0,
        "version_map_memory_in_bytes" : 0,
        "fixed_bit_set_memory_in_bytes" : 0,
        "max_unsafe_auto_id_timestamp" : -1,
        "file_sizes" : { }
      },
      "translog" : {
        "operations" : 0,
        "size_in_bytes" : 55,
        "uncommitted_operations" : 0,
        "uncommitted_size_in_bytes" : 55,
        "earliest_last_modified_age" : 0
      },
      "request_cache" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0,
        "hit_count" : 0,
        "miss_count" : 0
      },
      "recovery" : {
        "current_as_source" : 0,
        "current_as_target" : 0,
        "throttle_time_in_millis" : 0
      }
    },
    "total" : {
      "docs" : {
        "count" : 18632,
        "deleted" : 2180
      },
      "store" : {
        "size_in_bytes" : 359510890,
        "reserved_in_bytes" : 0
      },
      "indexing" : {
        "index_total" : 0,
        "index_time_in_millis" : 0,
        "index_current" : 0,
        "index_failed" : 0,
        "delete_total" : 0,
        "delete_time_in_millis" : 0,
        "delete_current" : 0,
        "noop_update_total" : 0,
        "is_throttled" : false,
        "throttle_time_in_millis" : 0
      },
      "get" : {
        "total" : 0,
        "time_in_millis" : 0,
        "exists_total" : 0,
        "exists_time_in_millis" : 0,
        "missing_total" : 0,
        "missing_time_in_millis" : 0,
        "current" : 0
      },
      "search" : {
        "open_contexts" : 0,
        "query_total" : 26,
        "query_time_in_millis" : 321,
        "query_current" : 0,
        "fetch_total" : 26,
        "fetch_time_in_millis" : 225,
        "fetch_current" : 0,
        "scroll_total" : 0,
        "scroll_time_in_millis" : 0,
        "scroll_current" : 0,
        "suggest_total" : 0,
        "suggest_time_in_millis" : 0,
        "suggest_current" : 0
      },
      "merges" : {
        "current" : 0,
        "current_docs" : 0,
        "current_size_in_bytes" : 0,
        "total" : 0,
        "total_time_in_millis" : 0,
        "total_docs" : 0,
        "total_size_in_bytes" : 0,
        "total_stopped_time_in_millis" : 0,
        "total_throttled_time_in_millis" : 0,
        "total_auto_throttle_in_bytes" : 20971520
      },
      "refresh" : {
        "total" : 2,
        "total_time_in_millis" : 0,
        "external_total" : 2,
        "external_total_time_in_millis" : 1,
        "listeners" : 0
      },
      "flush" : {
        "total" : 1,
        "periodic" : 0,
        "total_time_in_millis" : 0
      },
      "warmer" : {
        "current" : 0,
        "total" : 1,
        "total_time_in_millis" : 0
      },
      "query_cache" : {
        "memory_size_in_bytes" : 2076,
        "total_count" : 40,
        "hit_count" : 5,
        "miss_count" : 35,
        "cache_size" : 1,
        "cache_count" : 1,
        "evictions" : 0
      },
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0
      },
      "completion" : {
        "size_in_bytes" : 0
      },
      "segments" : {
        "count" : 2,
        "memory_in_bytes" : 37728,
        "terms_memory_in_bytes" : 25216,
        "stored_fields_memory_in_bytes" : 1968,
        "term_vectors_memory_in_bytes" : 0,
        "norms_memory_in_bytes" : 3840,
        "points_memory_in_bytes" : 0,
        "doc_values_memory_in_bytes" : 6704,
        "index_writer_memory_in_bytes" : 0,
        "version_map_memory_in_bytes" : 0,
        "fixed_bit_set_memory_in_bytes" : 0,
        "max_unsafe_auto_id_timestamp" : -1,
        "file_sizes" : { }
      },
      "translog" : {
        "operations" : 0,
        "size_in_bytes" : 55,
        "uncommitted_operations" : 0,
        "uncommitted_size_in_bytes" : 55,
        "earliest_last_modified_age" : 0
      },
      "request_cache" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0,
        "hit_count" : 0,
        "miss_count" : 0
      },
      "recovery" : {
        "current_as_source" : 0,
        "current_as_target" : 0,
        "throttle_time_in_millis" : 0
      }
    }
  },
  "indices" : {
    "research-index" : {
      "uuid" : "ltZ3Mh2JR2uxnQrPCOg0Ig",
      "primaries" : {
        "docs" : {
          "count" : 18632,
          "deleted" : 2180
        },
        "store" : {
          "size_in_bytes" : 359510890,
          "reserved_in_bytes" : 0
        },
        "indexing" : {
          "index_total" : 0,
          "index_time_in_millis" : 0,
          "index_current" : 0,
          "index_failed" : 0,
          "delete_total" : 0,
          "delete_time_in_millis" : 0,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        },
        "get" : {
          "total" : 0,
          "time_in_millis" : 0,
          "exists_total" : 0,
          "exists_time_in_millis" : 0,
          "missing_total" : 0,
          "missing_time_in_millis" : 0,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 26,
          "query_time_in_millis" : 321,
          "query_current" : 0,
          "fetch_total" : 26,
          "fetch_time_in_millis" : 225,
          "fetch_current" : 0,
          "scroll_total" : 0,
          "scroll_time_in_millis" : 0,
          "scroll_current" : 0,
          "suggest_total" : 0,
          "suggest_time_in_millis" : 0,
          "suggest_current" : 0
        },
        "merges" : {
          "current" : 0,
          "current_docs" : 0,
          "current_size_in_bytes" : 0,
          "total" : 0,
          "total_time_in_millis" : 0,
          "total_docs" : 0,
          "total_size_in_bytes" : 0,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 0,
          "total_auto_throttle_in_bytes" : 20971520
        },
        "refresh" : {
          "total" : 2,
          "total_time_in_millis" : 0,
          "external_total" : 2,
          "external_total_time_in_millis" : 1,
          "listeners" : 0
        },
        "flush" : {
          "total" : 1,
          "periodic" : 0,
          "total_time_in_millis" : 0
        },
        "warmer" : {
          "current" : 0,
          "total" : 1,
          "total_time_in_millis" : 0
        },
        "query_cache" : {
          "memory_size_in_bytes" : 2076,
          "total_count" : 40,
          "hit_count" : 5,
          "miss_count" : 35,
          "cache_size" : 1,
          "cache_count" : 1,
          "evictions" : 0
        },
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0
        },
        "completion" : {
          "size_in_bytes" : 0
        },
        "segments" : {
          "count" : 2,
          "memory_in_bytes" : 37728,
          "terms_memory_in_bytes" : 25216,
          "stored_fields_memory_in_bytes" : 1968,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 3840,
          "points_memory_in_bytes" : 0,
          "doc_values_memory_in_bytes" : 6704,
          "index_writer_memory_in_bytes" : 0,
          "version_map_memory_in_bytes" : 0,
          "fixed_bit_set_memory_in_bytes" : 0,
          "max_unsafe_auto_id_timestamp" : -1,
          "file_sizes" : { }
        },
        "translog" : {
          "operations" : 0,
          "size_in_bytes" : 55,
          "uncommitted_operations" : 0,
          "uncommitted_size_in_bytes" : 55,
          "earliest_last_modified_age" : 0
        },
        "request_cache" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "hit_count" : 0,
          "miss_count" : 0
        },
        "recovery" : {
          "current_as_source" : 0,
          "current_as_target" : 0,
          "throttle_time_in_millis" : 0
        }
      },
      "total" : {
        "docs" : {
          "count" : 18632,
          "deleted" : 2180
        },
        "store" : {
          "size_in_bytes" : 359510890,
          "reserved_in_bytes" : 0
        },
        "indexing" : {
          "index_total" : 0,
          "index_time_in_millis" : 0,
          "index_current" : 0,
          "index_failed" : 0,
          "delete_total" : 0,
          "delete_time_in_millis" : 0,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        },
        "get" : {
          "total" : 0,
          "time_in_millis" : 0,
          "exists_total" : 0,
          "exists_time_in_millis" : 0,
          "missing_total" : 0,
          "missing_time_in_millis" : 0,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 26,
          "query_time_in_millis" : 321,
          "query_current" : 0,
          "fetch_total" : 26,
          "fetch_time_in_millis" : 225,
          "fetch_current" : 0,
          "scroll_total" : 0,
          "scroll_time_in_millis" : 0,
          "scroll_current" : 0,
          "suggest_total" : 0,
          "suggest_time_in_millis" : 0,
          "suggest_current" : 0
        },
        "merges" : {
          "current" : 0,
          "current_docs" : 0,
          "current_size_in_bytes" : 0,
          "total" : 0,
          "total_time_in_millis" : 0,
          "total_docs" : 0,
          "total_size_in_bytes" : 0,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 0,
          "total_auto_throttle_in_bytes" : 20971520
        },
        "refresh" : {
          "total" : 2,
          "total_time_in_millis" : 0,
          "external_total" : 2,
          "external_total_time_in_millis" : 1,
          "listeners" : 0
        },
        "flush" : {
          "total" : 1,
          "periodic" : 0,
          "total_time_in_millis" : 0
        },
        "warmer" : {
          "current" : 0,
          "total" : 1,
          "total_time_in_millis" : 0
        },
        "query_cache" : {
          "memory_size_in_bytes" : 2076,
          "total_count" : 40,
          "hit_count" : 5,
          "miss_count" : 35,
          "cache_size" : 1,
          "cache_count" : 1,
          "evictions" : 0
        },
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0
        },
        "completion" : {
          "size_in_bytes" : 0
        },
        "segments" : {
          "count" : 2,
          "memory_in_bytes" : 37728,
          "terms_memory_in_bytes" : 25216,
          "stored_fields_memory_in_bytes" : 1968,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 3840,
          "points_memory_in_bytes" : 0,
          "doc_values_memory_in_bytes" : 6704,
          "index_writer_memory_in_bytes" : 0,
          "version_map_memory_in_bytes" : 0,
          "fixed_bit_set_memory_in_bytes" : 0,
          "max_unsafe_auto_id_timestamp" : -1,
          "file_sizes" : { }
        },
        "translog" : {
          "operations" : 0,
          "size_in_bytes" : 55,
          "uncommitted_operations" : 0,
          "uncommitted_size_in_bytes" : 55,
          "earliest_last_modified_age" : 0
        },
        "request_cache" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "hit_count" : 0,
          "miss_count" : 0
        },
        "recovery" : {
          "current_as_source" : 0,
          "current_as_target" : 0,
          "throttle_time_in_millis" : 0
        }
      }
    }
  }
}

2:

curl -X POST "localhost:9205/research-index/_update_by_query?pretty"
{
  "took" : 40900,
  "timed_out" : false,
  "total" : 18632,
  "updated" : 18632,
  "deleted" : 0,
  "batches" : 19,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

3:

curl -X GET "localhost:9205/research-index/_stats?pretty"
{
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "docs" : {
        "count" : 18632,
        "deleted" : 2698
      },
      "store" : {
        "size_in_bytes" : 948269715,
        "reserved_in_bytes" : 0
      },
      "indexing" : {
        "index_total" : 18632,
        "index_time_in_millis" : 35866,
        "index_current" : 0,
        "index_failed" : 0,
        "delete_total" : 0,
        "delete_time_in_millis" : 0,
        "delete_current" : 0,
        "noop_update_total" : 0,
        "is_throttled" : false,
        "throttle_time_in_millis" : 0
      },
      "get" : {
        "total" : 0,
        "time_in_millis" : 0,
        "exists_total" : 0,
        "exists_time_in_millis" : 0,
        "missing_total" : 0,
        "missing_time_in_millis" : 0,
        "current" : 0
      },
      "search" : {
        "open_contexts" : 0,
        "query_total" : 46,
        "query_time_in_millis" : 356,
        "query_current" : 0,
        "fetch_total" : 46,
        "fetch_time_in_millis" : 3519,
        "fetch_current" : 0,
        "scroll_total" : 1,
        "scroll_time_in_millis" : 40896,
        "scroll_current" : 0,
        "suggest_total" : 0,
        "suggest_time_in_millis" : 0,
        "suggest_current" : 0
      },
      "merges" : {
        "current" : 1,
        "current_docs" : 27812,
        "current_size_in_bytes" : 491844979,
        "total" : 0,
        "total_time_in_millis" : 0,
        "total_docs" : 0,
        "total_size_in_bytes" : 0,
        "total_stopped_time_in_millis" : 0,
        "total_throttled_time_in_millis" : 0,
        "total_auto_throttle_in_bytes" : 19065018
      },
      "refresh" : {
        "total" : 12,
        "total_time_in_millis" : 19364,
        "external_total" : 12,
        "external_total_time_in_millis" : 19484,
        "listeners" : 0
      },
      "flush" : {
        "total" : 1,
        "periodic" : 0,
        "total_time_in_millis" : 0
      },
      "warmer" : {
        "current" : 0,
        "total" : 11,
        "total_time_in_millis" : 1
      },
      "query_cache" : {
        "memory_size_in_bytes" : 2076,
        "total_count" : 41,
        "hit_count" : 5,
        "miss_count" : 36,
        "cache_size" : 1,
        "cache_count" : 1,
        "evictions" : 0
      },
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0
      },
      "completion" : {
        "size_in_bytes" : 0
      },
      "segments" : {
        "count" : 11,
        "memory_in_bytes" : 205808,
        "terms_memory_in_bytes" : 138688,
        "stored_fields_memory_in_bytes" : 6168,
        "term_vectors_memory_in_bytes" : 0,
        "norms_memory_in_bytes" : 21120,
        "points_memory_in_bytes" : 0,
        "doc_values_memory_in_bytes" : 39832,
        "index_writer_memory_in_bytes" : 66223296,
        "version_map_memory_in_bytes" : 759339,
        "fixed_bit_set_memory_in_bytes" : 0,
        "max_unsafe_auto_id_timestamp" : -1,
        "file_sizes" : { }
      },
      "translog" : {
        "operations" : 18632,
        "size_in_bytes" : 318730922,
        "uncommitted_operations" : 18632,
        "uncommitted_size_in_bytes" : 318730922,
        "earliest_last_modified_age" : 0
      },
      "request_cache" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0,
        "hit_count" : 0,
        "miss_count" : 0
      },
      "recovery" : {
        "current_as_source" : 0,
        "current_as_target" : 0,
        "throttle_time_in_millis" : 0
      }
    },
    "total" : {
      "docs" : {
        "count" : 18632,
        "deleted" : 2698
      },
      "store" : {
        "size_in_bytes" : 948269715,
        "reserved_in_bytes" : 0
      },
      "indexing" : {
        "index_total" : 18632,
        "index_time_in_millis" : 35866,
        "index_current" : 0,
        "index_failed" : 0,
        "delete_total" : 0,
        "delete_time_in_millis" : 0,
        "delete_current" : 0,
        "noop_update_total" : 0,
        "is_throttled" : false,
        "throttle_time_in_millis" : 0
      },
      "get" : {
        "total" : 0,
        "time_in_millis" : 0,
        "exists_total" : 0,
        "exists_time_in_millis" : 0,
        "missing_total" : 0,
        "missing_time_in_millis" : 0,
        "current" : 0
      },
      "search" : {
        "open_contexts" : 0,
        "query_total" : 46,
        "query_time_in_millis" : 356,
        "query_current" : 0,
        "fetch_total" : 46,
        "fetch_time_in_millis" : 3519,
        "fetch_current" : 0,
        "scroll_total" : 1,
        "scroll_time_in_millis" : 40896,
        "scroll_current" : 0,
        "suggest_total" : 0,
        "suggest_time_in_millis" : 0,
        "suggest_current" : 0
      },
      "merges" : {
        "current" : 1,
        "current_docs" : 27812,
        "current_size_in_bytes" : 491844979,
        "total" : 0,
        "total_time_in_millis" : 0,
        "total_docs" : 0,
        "total_size_in_bytes" : 0,
        "total_stopped_time_in_millis" : 0,
        "total_throttled_time_in_millis" : 0,
        "total_auto_throttle_in_bytes" : 19065018
      },
      "refresh" : {
        "total" : 12,
        "total_time_in_millis" : 19364,
        "external_total" : 12,
        "external_total_time_in_millis" : 19484,
        "listeners" : 0
      },
      "flush" : {
        "total" : 1,
        "periodic" : 0,
        "total_time_in_millis" : 0
      },
      "warmer" : {
        "current" : 0,
        "total" : 11,
        "total_time_in_millis" : 1
      },
      "query_cache" : {
        "memory_size_in_bytes" : 2076,
        "total_count" : 41,
        "hit_count" : 5,
        "miss_count" : 36,
        "cache_size" : 1,
        "cache_count" : 1,
        "evictions" : 0
      },
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0
      },
      "completion" : {
        "size_in_bytes" : 0
      },
      "segments" : {
        "count" : 11,
        "memory_in_bytes" : 205808,
        "terms_memory_in_bytes" : 138688,
        "stored_fields_memory_in_bytes" : 6168,
        "term_vectors_memory_in_bytes" : 0,
        "norms_memory_in_bytes" : 21120,
        "points_memory_in_bytes" : 0,
        "doc_values_memory_in_bytes" : 39832,
        "index_writer_memory_in_bytes" : 66223296,
        "version_map_memory_in_bytes" : 759339,
        "fixed_bit_set_memory_in_bytes" : 0,
        "max_unsafe_auto_id_timestamp" : -1,
        "file_sizes" : { }
      },
      "translog" : {
        "operations" : 18632,
        "size_in_bytes" : 318730922,
        "uncommitted_operations" : 18632,
        "uncommitted_size_in_bytes" : 318730922,
        "earliest_last_modified_age" : 0
      },
      "request_cache" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0,
        "hit_count" : 0,
        "miss_count" : 0
      },
      "recovery" : {
        "current_as_source" : 0,
        "current_as_target" : 0,
        "throttle_time_in_millis" : 0
      }
    }
  },
  "indices" : {
    "research-index" : {
      "uuid" : "ltZ3Mh2JR2uxnQrPCOg0Ig",
      "primaries" : {
        "docs" : {
          "count" : 18632,
          "deleted" : 2698
        },
        "store" : {
          "size_in_bytes" : 948269715,
          "reserved_in_bytes" : 0
        },
        "indexing" : {
          "index_total" : 18632,
          "index_time_in_millis" : 35866,
          "index_current" : 0,
          "index_failed" : 0,
          "delete_total" : 0,
          "delete_time_in_millis" : 0,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        },
        "get" : {
          "total" : 0,
          "time_in_millis" : 0,
          "exists_total" : 0,
          "exists_time_in_millis" : 0,
          "missing_total" : 0,
          "missing_time_in_millis" : 0,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 46,
          "query_time_in_millis" : 356,
          "query_current" : 0,
          "fetch_total" : 46,
          "fetch_time_in_millis" : 3519,
          "fetch_current" : 0,
          "scroll_total" : 1,
          "scroll_time_in_millis" : 40896,
          "scroll_current" : 0,
          "suggest_total" : 0,
          "suggest_time_in_millis" : 0,
          "suggest_current" : 0
        },
        "merges" : {
          "current" : 1,
          "current_docs" : 27812,
          "current_size_in_bytes" : 491844979,
          "total" : 0,
          "total_time_in_millis" : 0,
          "total_docs" : 0,
          "total_size_in_bytes" : 0,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 0,
          "total_auto_throttle_in_bytes" : 19065018
        },
        "refresh" : {
          "total" : 12,
          "total_time_in_millis" : 19364,
          "external_total" : 12,
          "external_total_time_in_millis" : 19484,
          "listeners" : 0
        },
        "flush" : {
          "total" : 1,
          "periodic" : 0,
          "total_time_in_millis" : 0
        },
        "warmer" : {
          "current" : 0,
          "total" : 11,
          "total_time_in_millis" : 1
        },
        "query_cache" : {
          "memory_size_in_bytes" : 2076,
          "total_count" : 41,
          "hit_count" : 5,
          "miss_count" : 36,
          "cache_size" : 1,
          "cache_count" : 1,
          "evictions" : 0
        },
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0
        },
        "completion" : {
          "size_in_bytes" : 0
        },
        "segments" : {
          "count" : 11,
          "memory_in_bytes" : 205808,
          "terms_memory_in_bytes" : 138688,
          "stored_fields_memory_in_bytes" : 6168,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 21120,
          "points_memory_in_bytes" : 0,
          "doc_values_memory_in_bytes" : 39832,
          "index_writer_memory_in_bytes" : 66223296,
          "version_map_memory_in_bytes" : 759339,
          "fixed_bit_set_memory_in_bytes" : 0,
          "max_unsafe_auto_id_timestamp" : -1,
          "file_sizes" : { }
        },
        "translog" : {
          "operations" : 18632,
          "size_in_bytes" : 318730922,
          "uncommitted_operations" : 18632,
          "uncommitted_size_in_bytes" : 318730922,
          "earliest_last_modified_age" : 0
        },
        "request_cache" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "hit_count" : 0,
          "miss_count" : 0
        },
        "recovery" : {
          "current_as_source" : 0,
          "current_as_target" : 0,
          "throttle_time_in_millis" : 0
        }
      },
      "total" : {
        "docs" : {
          "count" : 18632,
          "deleted" : 2698
        },
        "store" : {
          "size_in_bytes" : 948269715,
          "reserved_in_bytes" : 0
        },
        "indexing" : {
          "index_total" : 18632,
          "index_time_in_millis" : 35866,
          "index_current" : 0,
          "index_failed" : 0,
          "delete_total" : 0,
          "delete_time_in_millis" : 0,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        },
        "get" : {
          "total" : 0,
          "time_in_millis" : 0,
          "exists_total" : 0,
          "exists_time_in_millis" : 0,
          "missing_total" : 0,
          "missing_time_in_millis" : 0,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 46,
          "query_time_in_millis" : 356,
          "query_current" : 0,
          "fetch_total" : 46,
          "fetch_time_in_millis" : 3519,
          "fetch_current" : 0,
          "scroll_total" : 1,
          "scroll_time_in_millis" : 40896,
          "scroll_current" : 0,
          "suggest_total" : 0,
          "suggest_time_in_millis" : 0,
          "suggest_current" : 0
        },
        "merges" : {
          "current" : 1,
          "current_docs" : 27812,
          "current_size_in_bytes" : 491844979,
          "total" : 0,
          "total_time_in_millis" : 0,
          "total_docs" : 0,
          "total_size_in_bytes" : 0,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 0,
          "total_auto_throttle_in_bytes" : 19065018
        },
        "refresh" : {
          "total" : 12,
          "total_time_in_millis" : 19364,
          "external_total" : 12,
          "external_total_time_in_millis" : 19484,
          "listeners" : 0
        },
        "flush" : {
          "total" : 1,
          "periodic" : 0,
          "total_time_in_millis" : 0
        },
        "warmer" : {
          "current" : 0,
          "total" : 11,
          "total_time_in_millis" : 1
        },
        "query_cache" : {
          "memory_size_in_bytes" : 2076,
          "total_count" : 41,
          "hit_count" : 5,
          "miss_count" : 36,
          "cache_size" : 1,
          "cache_count" : 1,
          "evictions" : 0
        },
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0
        },
        "completion" : {
          "size_in_bytes" : 0
        },
        "segments" : {
          "count" : 11,
          "memory_in_bytes" : 205808,
          "terms_memory_in_bytes" : 138688,
          "stored_fields_memory_in_bytes" : 6168,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 21120,
          "points_memory_in_bytes" : 0,
          "doc_values_memory_in_bytes" : 39832,
          "index_writer_memory_in_bytes" : 66223296,
          "version_map_memory_in_bytes" : 759339,
          "fixed_bit_set_memory_in_bytes" : 0,
          "max_unsafe_auto_id_timestamp" : -1,
          "file_sizes" : { }
        },
        "translog" : {
          "operations" : 18632,
          "size_in_bytes" : 318730922,
          "uncommitted_operations" : 18632,
          "uncommitted_size_in_bytes" : 318730922,
          "earliest_last_modified_age" : 0
        },
        "request_cache" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "hit_count" : 0,
          "miss_count" : 0
        },
        "recovery" : {
          "current_as_source" : 0,
          "current_as_target" : 0,
          "throttle_time_in_millis" : 0
        }
      }
    }
  }
}

After the _update_by_query, I perform a KNN query. The first one times out. When I do it again, I get a response. The document that was previously not being recalled is now recalled, but others are now not being returned. Oddly, after performing this query, now _stats returns something different (again note the changed “deleted” count):

curl -X GET "localhost:9205/research-index/_stats?pretty"
{
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "docs" : {
        "count" : 18632,
        "deleted" : 11632
      },
      "store" : {
        "size_in_bytes" : 1196177467,
        "reserved_in_bytes" : 0
      },
      "indexing" : {
        "index_total" : 18632,
        "index_time_in_millis" : 35866,
        "index_current" : 0,
        "index_failed" : 0,
        "delete_total" : 0,
        "delete_time_in_millis" : 0,
        "delete_current" : 0,
        "noop_update_total" : 0,
        "is_throttled" : false,
        "throttle_time_in_millis" : 0
      },
      "get" : {
        "total" : 0,
        "time_in_millis" : 0,
        "exists_total" : 0,
        "exists_time_in_millis" : 0,
        "missing_total" : 0,
        "missing_time_in_millis" : 0,
        "current" : 0
      },
      "search" : {
        "open_contexts" : 0,
        "query_total" : 48,
        "query_time_in_millis" : 406,
        "query_current" : 0,
        "fetch_total" : 48,
        "fetch_time_in_millis" : 3535,
        "fetch_current" : 0,
        "scroll_total" : 1,
        "scroll_time_in_millis" : 40896,
        "scroll_current" : 0,
        "suggest_total" : 0,
        "suggest_time_in_millis" : 0,
        "suggest_current" : 0
      },
      "merges" : {
        "current" : 1,
        "current_docs" : 30264,
        "current_size_in_bytes" : 524525951,
        "total" : 1,
        "total_time_in_millis" : 48685,
        "total_docs" : 27812,
        "total_size_in_bytes" : 491844979,
        "total_stopped_time_in_millis" : 0,
        "total_throttled_time_in_millis" : 11209,
        "total_auto_throttle_in_bytes" : 17331834
      },
      "refresh" : {
        "total" : 14,
        "total_time_in_millis" : 28465,
        "external_total" : 14,
        "external_total_time_in_millis" : 28682,
        "listeners" : 0
      },
      "flush" : {
        "total" : 1,
        "periodic" : 0,
        "total_time_in_millis" : 0
      },
      "warmer" : {
        "current" : 0,
        "total" : 13,
        "total_time_in_millis" : 1
      },
      "query_cache" : {
        "memory_size_in_bytes" : 2940,
        "total_count" : 43,
        "hit_count" : 5,
        "miss_count" : 38,
        "cache_size" : 1,
        "cache_count" : 2,
        "evictions" : 1
      },
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0
      },
      "completion" : {
        "size_in_bytes" : 0
      },
      "segments" : {
        "count" : 5,
        "memory_in_bytes" : 92204,
        "terms_memory_in_bytes" : 63040,
        "stored_fields_memory_in_bytes" : 3864,
        "term_vectors_memory_in_bytes" : 0,
        "norms_memory_in_bytes" : 9600,
        "points_memory_in_bytes" : 0,
        "doc_values_memory_in_bytes" : 15700,
        "index_writer_memory_in_bytes" : 0,
        "version_map_memory_in_bytes" : 0,
        "fixed_bit_set_memory_in_bytes" : 0,
        "max_unsafe_auto_id_timestamp" : -1,
        "file_sizes" : { }
      },
      "translog" : {
        "operations" : 18632,
        "size_in_bytes" : 318730922,
        "uncommitted_operations" : 18632,
        "uncommitted_size_in_bytes" : 318730922,
        "earliest_last_modified_age" : 0
      },
      "request_cache" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0,
        "hit_count" : 0,
        "miss_count" : 0
      },
      "recovery" : {
        "current_as_source" : 0,
        "current_as_target" : 0,
        "throttle_time_in_millis" : 0
      }
    },
    "total" : {
      "docs" : {
        "count" : 18632,
        "deleted" : 11632
      },
      "store" : {
        "size_in_bytes" : 1196177467,
        "reserved_in_bytes" : 0
      },
      "indexing" : {
        "index_total" : 18632,
        "index_time_in_millis" : 35866,
        "index_current" : 0,
        "index_failed" : 0,
        "delete_total" : 0,
        "delete_time_in_millis" : 0,
        "delete_current" : 0,
        "noop_update_total" : 0,
        "is_throttled" : false,
        "throttle_time_in_millis" : 0
      },
      "get" : {
        "total" : 0,
        "time_in_millis" : 0,
        "exists_total" : 0,
        "exists_time_in_millis" : 0,
        "missing_total" : 0,
        "missing_time_in_millis" : 0,
        "current" : 0
      },
      "search" : {
        "open_contexts" : 0,
        "query_total" : 48,
        "query_time_in_millis" : 406,
        "query_current" : 0,
        "fetch_total" : 48,
        "fetch_time_in_millis" : 3535,
        "fetch_current" : 0,
        "scroll_total" : 1,
        "scroll_time_in_millis" : 40896,
        "scroll_current" : 0,
        "suggest_total" : 0,
        "suggest_time_in_millis" : 0,
        "suggest_current" : 0
      },
      "merges" : {
        "current" : 1,
        "current_docs" : 30264,
        "current_size_in_bytes" : 524525951,
        "total" : 1,
        "total_time_in_millis" : 48685,
        "total_docs" : 27812,
        "total_size_in_bytes" : 491844979,
        "total_stopped_time_in_millis" : 0,
        "total_throttled_time_in_millis" : 11209,
        "total_auto_throttle_in_bytes" : 17331834
      },
      "refresh" : {
        "total" : 14,
        "total_time_in_millis" : 28465,
        "external_total" : 14,
        "external_total_time_in_millis" : 28682,
        "listeners" : 0
      },
      "flush" : {
        "total" : 1,
        "periodic" : 0,
        "total_time_in_millis" : 0
      },
      "warmer" : {
        "current" : 0,
        "total" : 13,
        "total_time_in_millis" : 1
      },
      "query_cache" : {
        "memory_size_in_bytes" : 2940,
        "total_count" : 43,
        "hit_count" : 5,
        "miss_count" : 38,
        "cache_size" : 1,
        "cache_count" : 2,
        "evictions" : 1
      },
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0
      },
      "completion" : {
        "size_in_bytes" : 0
      },
      "segments" : {
        "count" : 5,
        "memory_in_bytes" : 92204,
        "terms_memory_in_bytes" : 63040,
        "stored_fields_memory_in_bytes" : 3864,
        "term_vectors_memory_in_bytes" : 0,
        "norms_memory_in_bytes" : 9600,
        "points_memory_in_bytes" : 0,
        "doc_values_memory_in_bytes" : 15700,
        "index_writer_memory_in_bytes" : 0,
        "version_map_memory_in_bytes" : 0,
        "fixed_bit_set_memory_in_bytes" : 0,
        "max_unsafe_auto_id_timestamp" : -1,
        "file_sizes" : { }
      },
      "translog" : {
        "operations" : 18632,
        "size_in_bytes" : 318730922,
        "uncommitted_operations" : 18632,
        "uncommitted_size_in_bytes" : 318730922,
        "earliest_last_modified_age" : 0
      },
      "request_cache" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0,
        "hit_count" : 0,
        "miss_count" : 0
      },
      "recovery" : {
        "current_as_source" : 0,
        "current_as_target" : 0,
        "throttle_time_in_millis" : 0
      }
    }
  },
  "indices" : {
    "research-index" : {
      "uuid" : "ltZ3Mh2JR2uxnQrPCOg0Ig",
      "primaries" : {
        "docs" : {
          "count" : 18632,
          "deleted" : 11632
        },
        "store" : {
          "size_in_bytes" : 1196177467,
          "reserved_in_bytes" : 0
        },
        "indexing" : {
          "index_total" : 18632,
          "index_time_in_millis" : 35866,
          "index_current" : 0,
          "index_failed" : 0,
          "delete_total" : 0,
          "delete_time_in_millis" : 0,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        },
        "get" : {
          "total" : 0,
          "time_in_millis" : 0,
          "exists_total" : 0,
          "exists_time_in_millis" : 0,
          "missing_total" : 0,
          "missing_time_in_millis" : 0,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 48,
          "query_time_in_millis" : 406,
          "query_current" : 0,
          "fetch_total" : 48,
          "fetch_time_in_millis" : 3535,
          "fetch_current" : 0,
          "scroll_total" : 1,
          "scroll_time_in_millis" : 40896,
          "scroll_current" : 0,
          "suggest_total" : 0,
          "suggest_time_in_millis" : 0,
          "suggest_current" : 0
        },
        "merges" : {
          "current" : 1,
          "current_docs" : 30264,
          "current_size_in_bytes" : 524525951,
          "total" : 1,
          "total_time_in_millis" : 48685,
          "total_docs" : 27812,
          "total_size_in_bytes" : 491844979,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 11209,
          "total_auto_throttle_in_bytes" : 17331834
        },
        "refresh" : {
          "total" : 14,
          "total_time_in_millis" : 28465,
          "external_total" : 14,
          "external_total_time_in_millis" : 28682,
          "listeners" : 0
        },
        "flush" : {
          "total" : 1,
          "periodic" : 0,
          "total_time_in_millis" : 0
        },
        "warmer" : {
          "current" : 0,
          "total" : 13,
          "total_time_in_millis" : 1
        },
        "query_cache" : {
          "memory_size_in_bytes" : 2940,
          "total_count" : 43,
          "hit_count" : 5,
          "miss_count" : 38,
          "cache_size" : 1,
          "cache_count" : 2,
          "evictions" : 1
        },
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0
        },
        "completion" : {
          "size_in_bytes" : 0
        },
        "segments" : {
          "count" : 5,
          "memory_in_bytes" : 92204,
          "terms_memory_in_bytes" : 63040,
          "stored_fields_memory_in_bytes" : 3864,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 9600,
          "points_memory_in_bytes" : 0,
          "doc_values_memory_in_bytes" : 15700,
          "index_writer_memory_in_bytes" : 0,
          "version_map_memory_in_bytes" : 0,
          "fixed_bit_set_memory_in_bytes" : 0,
          "max_unsafe_auto_id_timestamp" : -1,
          "file_sizes" : { }
        },
        "translog" : {
          "operations" : 18632,
          "size_in_bytes" : 318730922,
          "uncommitted_operations" : 18632,
          "uncommitted_size_in_bytes" : 318730922,
          "earliest_last_modified_age" : 0
        },
        "request_cache" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "hit_count" : 0,
          "miss_count" : 0
        },
        "recovery" : {
          "current_as_source" : 0,
          "current_as_target" : 0,
          "throttle_time_in_millis" : 0
        }
      },
      "total" : {
        "docs" : {
          "count" : 18632,
          "deleted" : 11632
        },
        "store" : {
          "size_in_bytes" : 1196177467,
          "reserved_in_bytes" : 0
        },
        "indexing" : {
          "index_total" : 18632,
          "index_time_in_millis" : 35866,
          "index_current" : 0,
          "index_failed" : 0,
          "delete_total" : 0,
          "delete_time_in_millis" : 0,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        },
        "get" : {
          "total" : 0,
          "time_in_millis" : 0,
          "exists_total" : 0,
          "exists_time_in_millis" : 0,
          "missing_total" : 0,
          "missing_time_in_millis" : 0,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 48,
          "query_time_in_millis" : 406,
          "query_current" : 0,
          "fetch_total" : 48,
          "fetch_time_in_millis" : 3535,
          "fetch_current" : 0,
          "scroll_total" : 1,
          "scroll_time_in_millis" : 40896,
          "scroll_current" : 0,
          "suggest_total" : 0,
          "suggest_time_in_millis" : 0,
          "suggest_current" : 0
        },
        "merges" : {
          "current" : 1,
          "current_docs" : 30264,
          "current_size_in_bytes" : 524525951,
          "total" : 1,
          "total_time_in_millis" : 48685,
          "total_docs" : 27812,
          "total_size_in_bytes" : 491844979,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 11209,
          "total_auto_throttle_in_bytes" : 17331834
        },
        "refresh" : {
          "total" : 14,
          "total_time_in_millis" : 28465,
          "external_total" : 14,
          "external_total_time_in_millis" : 28682,
          "listeners" : 0
        },
        "flush" : {
          "total" : 1,
          "periodic" : 0,
          "total_time_in_millis" : 0
        },
        "warmer" : {
          "current" : 0,
          "total" : 13,
          "total_time_in_millis" : 1
        },
        "query_cache" : {
          "memory_size_in_bytes" : 2940,
          "total_count" : 43,
          "hit_count" : 5,
          "miss_count" : 38,
          "cache_size" : 1,
          "cache_count" : 2,
          "evictions" : 1
        },
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0
        },
        "completion" : {
          "size_in_bytes" : 0
        },
        "segments" : {
          "count" : 5,
          "memory_in_bytes" : 92204,
          "terms_memory_in_bytes" : 63040,
          "stored_fields_memory_in_bytes" : 3864,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 9600,
          "points_memory_in_bytes" : 0,
          "doc_values_memory_in_bytes" : 15700,
          "index_writer_memory_in_bytes" : 0,
          "version_map_memory_in_bytes" : 0,
          "fixed_bit_set_memory_in_bytes" : 0,
          "max_unsafe_auto_id_timestamp" : -1,
          "file_sizes" : { }
        },
        "translog" : {
          "operations" : 18632,
          "size_in_bytes" : 318730922,
          "uncommitted_operations" : 18632,
          "uncommitted_size_in_bytes" : 318730922,
          "earliest_last_modified_age" : 0
        },
        "request_cache" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "hit_count" : 0,
          "miss_count" : 0
        },
        "recovery" : {
          "current_as_source" : 0,
          "current_as_target" : 0,
          "throttle_time_in_millis" : 0
        }
      }
    }
  }
}

Hmm, this may be related to the fact that when I created the index initially, KNN was not enabled. Before indexing any documents, I did a PUT request to enable it and change spacetype to cosineSimul:

curl -XPOST 'localhost:9205/research-index/_close'

curl -X PUT "http://localhost:9205/research-index/_settings?pretty" -H 'Content-Type: application/json' -d' {"index": {"knn": true, "knn.space_type": "cosinesimil"}}'

curl -XPOST 'localhost:9205/research-index/_open'

curl -XPOST 'localhost:9205/research-index/_update_by_query?pretty'

Right now I am trying to enable KNN and set the space type in the initial PUT request when I create the index (in the same JSON with mappings). I am currently testing and will let you know if this addresses the issue.

Yes, it looks like that was the issue. Basically, if you do a PUT request to enable KNN after an index is created (and in this case spacetype cosinesimil), it fails silently by ignoring a subset of documents, which are considered deleted for some reason.