Loss of nested fields when using pipeline

Hello. In opensearch version 2.19, we have a problem. Loss of nested fields when using pipeline. When we try to create an index with nested fields, only one element of the nested object gets into index.

In the example below, only the “descr” fields are included in the index. The name and id fields are missing. There is no such problem in version 2.18.

Example simulate query:

POST /_ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "text_embedding": {
          "model_id": "BeQ3O5UBwVQ2gq6gyMSO",
          "field_map": {
            "ml_text": "ml_embedding"
          }
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "test_index", 
      "_source": {
        "nested_field": [
          { 
            "id": "value1", 
            "name": "value1", 
            "descr": "value1" 
            
          },
          { 
            "id": "value2", 
            "name": "value2",
            "descr": "value2" 
          }
        ],
        "categories": [
          { 
            "id": "value3", 
            "name": "value3", 
            "descr": "value3" 
            
          },
          { 
            "id": "value4", 
            "name": "value4",
            "descr": "value4" 
          }
        ]
      }
    }
  ]
}

Result:

{
  "docs": [
    {
      "doc": {
        "_index": "test_index",
        "_id": "_id",
        "_source": {
          "nested_field": [
            {
              "descr": "value1"
            },
            {
              "descr": "value2"
            }
          ],
          "categories": [
            {
              "descr": "value3"
            },
            {
              "descr": "value4"
            }
          ]
        },
        "_ingest": {
          "timestamp": "2025-03-18T12:15:00.656183009Z"
        }
      }
    }
  ]
}

This bug fixed in 2.19.2 check [BUG] text_embedding truncates key-value maps in a pipeline · Issue #3804 · opensearch-project/ml-commons · GitHub

1 Like

In OpenSearch 2.19, the text_embedding processor may strip non-mapped fields from nested objects, unlike in 2.18. To fix this, define nested mappings and ensure all nested fields needed are explicitly included in field_map or handled using appropriate nested-aware processors.