Read incremental logs from opensearch index as input

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Describe the issue:

Hello, I’m trying to use Data Prepper to read logs from opensearch index (as input) and send to an external server. Just for test, I’m reading from opensearch index and writing to local file instead of external server. The problem (?) is that for each interval, the full content of the index is written into the file, leading to lot of duplicates into the file. I would like to have an “incremental read” from the index to put in the file only new logs which have not been written in the file already. Is there any configuration to make it work?

Configuration:

  source:
    opensearch:
      hosts: ["https://<my-host>:9200"]
      indices:
        include:
          - index_name_regex: "<my-index>_2026-01-16"
      username: "admin"
      password: "<password>"
      connection:
        insecure: true
      scheduling:
        interval: "30s"
      acknowledgments: true

Relevant Logs or Screenshots:

@steppox I don’t believe there is a way to achieve this currently, and this is marked in code as “todo”.

Can you elaborate on the current setup you are running? How do the documents end up in that index on the first place? Perhaps configuring this as sources directly is a more prudent approach?

Alternatively you can also use Logstash, where you can define a query to run with a time range, see example below:

input {
  opensearch {
    hosts    => ["https://localhost:9200"]
    index    => "my-logs"
    query    => '{ "query": { "range": { "@timestamp": { "gte": "now-1m" } } }, "sort": [{ "@timestamp": "asc" }] }'
    schedule => "* * * * *"
  }
}

You would need to install the OpenSearch input plugin and also note that there is no automatic method to detect new documents. In the above case you are relying on the timestampts being correct and everything being processed on time, otherwise you might have duplicates or missing documents.