Append processor for vector field

hai guys .i am new to open search.i am currently developing rag .

i just created ingest pipeline

  1. base 64 → words (attachment plugin)
  2. words → chunking (chunking processor)
  3. chunking → create embeddings for each chunks(foreach processor)

in the last step…i added another pipeline for create embddings for each chunk

in new pipeline

1 . each chunks → embedings(text embeddings )
2 . embedding → append into array[object] ( append ) like bellow

in the object

[
{
"text_chunks: “”,
“text_chunks_embeddings” : “”
}
]

appending is working fine…but appended vectors data type is wrong…

i expected { text_chunks_embeddings:[45345345,4234234,324324,42545] } but i got {text_chunks_embeddings:{0:3498923,1:39048329,2:4535545 } }

:thinking: :smiley: :smiley: :smiley: please help me

It is recommended to follow the official text chunking user guide and obtain the following text_chunk_embedding result.

"text_chunk_embedding": [
  {
    "knn": [ ... ]
  },
  {
    "knn": [ ... ]
  },
  {
    "knn": [ ... ]
  }
]

If your problem persists, can you share your pipeline configuration for text chunking, embedding and append processor?

Are you using an append processor? If so, please share your append processor configuration.

PUT _ingest/pipeline/inside_foreach_chunk_embedding
{
“description”: "inside foreach chunk embedding " ,
“processors”: [

{
  "set": {
    "field": "text_chunks",
    "value": "{{{ _ingest._value }}}"
  }
},

{
  "set": {
    "field": "text_chunks_embeddings",
    "value": ""
  }
},

{
  "text_embedding": {
    "model_id": "bQ1J8ooBpBj3wT4HVUsb",
    "field_map": {
      "text_chunks": "text_chunks_embeddings"
    }
  }
},

{
  "append": {
    "field": "chunked_passage",
    "value": {
      "text_chunks" : "{{{ text_chunks }}}",
      "text_chunks_embeddings" : "{{{ text_chunks_embeddings }}}"
    }
  }
},

{
  "remove": {
    "field": "text_chunks"
  }
},


{
  "remove": {
    "field": "text_chunks_embeddings"
  }
}

]
}

What is _ingest._value

its a sub pipeline processor for for_each processor like

{
“foreach”: {
“field”: “chunked_passage”,
“processor”: {
“pipeline”: {
“name”: “inside_foreach_chunk_embedding”
}
}
}
}

I’ve checked your provided ingestion pipeline. Could you please provide a few sample documents for me to try it.

Btw, have you tried the official text chunking documentation?

  1. Text chunking - OpenSearch Documentation
  2. Text chunking - OpenSearch Documentation