Some of OpenSearch's search response documents are intermittently missing

Problem Description

Hello OpenSearch community,

We’re having a difficult problem with our OpenSearch setup, and we need your help. Sometimes, when we search for documents, some of them are missing from the results. This happens randomly, and we can’t figure out why.

Our Setup:

• OpenSearch: v2.17.0 (updated from 2.4)
• Server: Ubuntu 20.04
• Kubernetes (v1.25.6) with 1 OpenSearch Cluster, 1 Scheduler App, and 1 Service App
• Single Node setup (Primary: 1, Replica: 0)

The Problem:

• When we search, some documents that should be in the results are not there.
• This doesn’t happen all the time, but it keeps happening even after we’ve tried many solutions.
• It’s not always the same documents that are missing.
• We’ve checked, and the missing documents do exist in the index.
• It happens even when our search should definitely include these documents.

Example of Inconsistent Results:

Here’s an example of how the results can be inconsistent (this is our estimation of what might be happening):

  1. Add 3 new items
  2. Scheduler 1st run - Query new item IDs to filter existence, initially 0 found, so 3 items added
  3. Scheduler 2nd run - Query using the latest time of added new items. 1 item found but not updated due to duplicate item logic
  4. Add new item
  5. Scheduler 3rd run - Query existing items + new item, 1 item added. But due to missing items, duplicate filtering fails. 0 items found in search, so 1 item added << Duplication occurs

What We’ve Tried:

  1. We removed replica shards because we’re using a single node.
  2. We updated OpenSearch from version 2.4 to 2.17.
  3. We improved our duplicate filtering:
    • We now do extra searches when we get zero results.
    • We log when documents appear in retry searches.
    • We return the correct document after making sure it exists.
    This helped a little, but we still have problems, especially with new items.
  4. For bulk indexing, we use the “refresh=wait_for” parameter with each request to ensure the changes are visible to search before the operation completes.

Our Query:

We’re using the following query structure:

{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "channel-type.keyword": "ChannelType"
          }
        },
        {
          "term": {
            "resource-type.keyword": "ResourceType"
          }
        },
        {
          "terms": {
            "item.id": ["ItemId1", "ItemId2", "..."]
          }
        }
      ]
    }
  }
}

Our Questions:

1. Caching Issues:

• When we search with the same queries, we often get the same incomplete results.
• We update most document properties, except channel-type, resource-type, and item.id. Shouldn’t we get updated results?
• We noticed something strange: after a recent update, we only saw 3 missing document cases in 3 days. But then we saw over 100 cases after 6 days.
• Do you know what might cause this? Could Kubernetes caching be affecting our results?

2. Scheduler Problems:

• We have two schedulers that index documents:
a) One that runs every minute for recently added items
b) One that runs daily for all items
• Both of these index the same item indices (like item-{itemId}-{yyyy.MM}).
• They’re supposed to run one after the other, but we’re worried they might be interfering with each other.
• They both use the Bulk API for updates. Could this be causing documents to go missing?

3. Query Issues:

• Could our query setup be causing our documents to disappear from results?
• We’re using bool queries with filters on channel-type.keyword, resource-type.keyword, and a list of item.ids as shown in the query above.

4. Refresh Issues:

• We’re using “refresh=wait_for” with our bulk index requests.
• Could there be any issues with this approach that might lead to inconsistent search results?

We hope someone in the community has seen something like this before or has ideas we haven’t thought of. We’ve been looking at this problem for a long time and need some new perspectives.

Thank you for any help you can give. We really appreciate your time and expertise!