Append processor for vector field

hai guys .i am new to open search.i am currently developing rag .

i just created ingest pipeline

  1. base 64 → words (attachment plugin)
  2. words → chunking (chunking processor)
  3. chunking → create embeddings for each chunks(foreach processor)

in the last step…i added another pipeline for create embddings for each chunk

in new pipeline

1 . each chunks → embedings(text embeddings )
2 . embedding → append into array[object] ( append ) like bellow

in the object

[
{
"text_chunks: “”,
“text_chunks_embeddings” : “”
}
]

appending is working fine…but appended vectors data type is wrong…

i expected { text_chunks_embeddings:[45345345,4234234,324324,42545] } but i got {text_chunks_embeddings:{0:3498923,1:39048329,2:4535545 } }

:thinking: :smiley: :smiley: :smiley: please help me

It is recommended to follow the official text chunking user guide and obtain the following text_chunk_embedding result.

"text_chunk_embedding": [
  {
    "knn": [ ... ]
  },
  {
    "knn": [ ... ]
  },
  {
    "knn": [ ... ]
  }
]

If your problem persists, can you share your pipeline configuration for text chunking, embedding and append processor?

1 Like

Are you using an append processor? If so, please share your append processor configuration.

1 Like

PUT _ingest/pipeline/inside_foreach_chunk_embedding
{
“description”: "inside foreach chunk embedding " ,
“processors”: [

{
  "set": {
    "field": "text_chunks",
    "value": "{{{ _ingest._value }}}"
  }
},

{
  "set": {
    "field": "text_chunks_embeddings",
    "value": ""
  }
},

{
  "text_embedding": {
    "model_id": "bQ1J8ooBpBj3wT4HVUsb",
    "field_map": {
      "text_chunks": "text_chunks_embeddings"
    }
  }
},

{
  "append": {
    "field": "chunked_passage",
    "value": {
      "text_chunks" : "{{{ text_chunks }}}",
      "text_chunks_embeddings" : "{{{ text_chunks_embeddings }}}"
    }
  }
},

{
  "remove": {
    "field": "text_chunks"
  }
},


{
  "remove": {
    "field": "text_chunks_embeddings"
  }
}

]
}

1 Like

What is _ingest._value

1 Like

its a sub pipeline processor for for_each processor like

{
“foreach”: {
“field”: “chunked_passage”,
“processor”: {
“pipeline”: {
“name”: “inside_foreach_chunk_embedding”
}
}
}
}

1 Like

I’ve checked your provided ingestion pipeline. Could you please provide a few sample documents for me to try it.

Btw, have you tried the official text chunking documentation?

  1. Text chunking - OpenSearch Documentation
  2. Text chunking - OpenSearch Documentation