Retrieve only array elements matching a predicate

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

  • OpenSearch version: 3.5.0
  • Server OS version: Amazon Linux 2023.10.20260302
  • Opensearch Dashboards version: 3.5.0

Describe the issue: Given an array of object in my search results, how can I extract only the elements matching a given condition?

  • Steps to reproduce
    • Index a document, creating a new index at the same time
      PUT testindex1/_doc/100
      {
      "patients": [
      {"name" : "John Doe", "age" : 56, "smoker" : true},
      {“name” : "Mary Major", "age" : 85, "smoker" : false}
      ]
      }
    • Run a query, returning one (or more) hits:
      GET testindex1/_search
      {
      "query": {
      "term": {
      "patients.smoker": {
      "value": true
      }
      }
      }
      }
    • Get some hits:
      {
      ...,
      "hits": {
      ...,
      "hits: [
      {
      ...,
      "_source": {
      "patients": [
      {
      "name": "John Doe",
      "age": 56,
      "smoker": true
      },
      {
      "name": "Mary Major",
      "age": 85,
      "smoker": false
      }
      ]
      }
      }
      ]
      }
      }

Is there any way to filter the "patients" array in the returned search hits, for example by only showing the array elements having patients.smoker = true?

In my specific example, this is the same condition used as a search term (I would be okay with a solution which imposed a similar restriction).

Configuration: I started my OpenSearch instance from a Docker image:
docker run -d --name opensearch -p 9200:9200 -e 'DISABLE_SECURITY_PLUGIN=true' -e 'discovery.type=single-node' opensearchproject/opensearch

@danilopiazza Welcome to the forum and thanks for the question.

Have you explored using script_fields, see example query below:

GET testindex1/_search
{
  "query": {
    "term": {
      "patients.smoker": {
        "value": true
      }
    }
  },
  "_source": false,
  "script_fields": {
    "matching_patients": {
      "script": {
        "lang": "painless",
        "source": """
          def result = [];
          for (def p : params._source['patients']) {
            if (p['smoker'] == true) {
              result.add(p);
            }
          }
          return result;
        """
      }
    }
  }
}