Transform Job "Failed to get the modified buckets in source indices"

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.4.0

Describe the issue:
I’ve configured a transform job to aggregate the number of documents ingested pr agent.hostname, to have some insight into if the servers are working as expected. The job fails at what seems to be random intervals, but without any details in the logs emitted by the data nodes.

Transform job stops with error “Failed to get the modified buckets in source indices”. How can I investigate this issue further to resolve whatever makes the transform job fail? It’s configured for continuous operation and when it stops aggregating the number of documents ingested the external consumer of the data won’t work.

Configuration:

{
  "_id" : "count_docs_pr_hostname_minute",
  "_version" : 4,
  "_seq_no" : 153407968,
  "_primary_term" : 115,
  "transform" : {
    "transform_id" : "count_docs_pr_hostname_minute",
    "schema_version" : 17,
    "schedule" : {
      "interval" : {
        "start_time" : 1708942249057,
        "period" : 1,
        "unit" : "Minutes"
      }
    },
    "metadata_id" : "GUdogz3_rvq1SFZM_9ElYw",
    "updated_at" : 1709113842260,
    "enabled" : true,
    "enabled_at" : 1709113842260,
    "description" : "Count documents per agent.hostname per minute",
    "source_index" : "companyname*",
    "data_selection_query" : {
      "match_all" : {
        "boost" : 1.0
      }
    },
    "target_index" : "prefix-doccount",
    "page_size" : 1000,
    "groups" : [
      {
        "terms" : {
          "source_field" : "agent.hostname",
          "target_field" : "agent_hostname"
        }
      },
      {
        "date_histogram" : {
          "fixed_interval" : "1m",
          "source_field" : "@timestamp",
          "target_field" : "timestamp_minute",
          "timezone" : "UTC"
        }
      }
    ],
    "aggregations" : {
      "documents_count" : {
        "value_count" : {
          "field" : "agent.hostname"
        }
      }
    },
    "continuous" : true
  }
}

Relevant Logs or Screenshots:

So I’ve been ooo and when I got back now, the transform job have crashed again: “Failed to search data in source indices”. This makes very little sense to me, as the indices targeted by the transform job have tons of data…

On parsing all logfiles I’ve found these lines, but they don’t provide much to go on when looking for a way to resolve what’s causing this. Are there any configuration options that can be changed to get OS to emit more detailed information when running transform jobs?

/opensearch-lvm/logs/sb-logs-2024-02-29-1.log.gz:[2024-02-29T14:48:26,007][ERROR][o.o.i.t.TransformRunner  ] [osl-ask] Failed to execute the transform job [count_docs_pr_hostname_minute] because of exception [Failed to search data in source indices]
/opensearch-lvm/logs/sb-logs-2024-02-29-1.log.gz:org.opensearch.indexmanagement.transform.exceptions.TransformSearchServiceException: Failed to search data in source indices
/opensearch-lvm/logs/sb-logs-2024-02-29-1.log.gz:	at org.opensearch.indexmanagement.transform.TransformSearchService.executeCompositeSearch(TransformSearchService.kt:218) ~[opensearch-index-management-2.4.0.0.jar:2.4.0.0]
/opensearch-lvm/logs/sb-logs-2024-02-29-1.log.gz:	at org.opensearch.indexmanagement.transform.TransformSearchService$executeCompositeSearch$1.invokeSuspend(TransformSearchService.kt) ~[opensearch-index-management-2.4.0.0.jar:2.4.0.0]
/opensearch-lvm/logs/sb-logs-2024-02-29-1.log.gz:	at org.opensearch.indexmanagement.transform.TransformRunner.executeJob(TransformRunner.kt:173) [opensearch-index-management-2.4.0.0.jar:2.4.0.0]
/opensearch-lvm/logs/sb-logs-2024-02-29-1.log.gz:	at org.opensearch.indexmanagement.transform.TransformRunner.access$executeJob(TransformRunner.kt:41) [opensearch-index-management-2.4.0.0.jar:2.4.0.0]
/opensearch-lvm/logs/sb-logs-2024-02-29-1.log.gz:	at org.opensearch.indexmanagement.transform.TransformRunner$executeJob$1.invokeSuspend(TransformRunner.kt) [opensearch-index-management-2.4.0.0.jar:2.4.0.0]
/opensearch-lvm/logs/sb-logs-2024-02-29-1.log.gz:[2024-02-29T14:48:26,014][INFO ][o.o.i.t.TransformRunner  ] [osl-ask] Disabling the transform job count_docs_pr_hostname_minute

Output from _explain endpoint:

{
  "count_docs_pr_hostname_minute" : {
    "metadata_id" : "GUdogz3_rvq1SFZM_9ElYw",
    "transform_metadata" : {
      "transform_id" : "count_docs_pr_hostname_minute",
      "after_key" : {
        "timestamp_minute" : 1699799100000,
        "agent_hostname" : "another.hostname"
      },
      "last_updated_at" : 1709218106012,
      "status" : "failed",
      "failure_reason" : "Failed to search data in source indices",
      "stats" : {
        "pages_processed" : 40536,
        "documents_processed" : 2676854779,
        "documents_indexed" : 2733103,
        "index_time_in_millis" : 635127,
        "search_time_in_millis" : 153291637
      },
      "continuous_stats" : { }
    }
  }
}

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.