Append processor for vector field

Arun · October 10, 2024, 5:14am

hai guys .i am new to open search.i am currently developing rag .

i just created ingest pipeline

base 64 → words (attachment plugin)
words → chunking (chunking processor)
chunking → create embeddings for each chunks(foreach processor)

in the last step…i added another pipeline for create embddings for each chunk

in new pipeline

1 . each chunks → embedings(text embeddings )
2 . embedding → append into array[object] ( append ) like bellow

in the object

[
{
"text_chunks: “”,
“text_chunks_embeddings” : “”
}
]

appending is working fine…but appended vectors data type is wrong…

i expected { text_chunks_embeddings:[45345345,4234234,324324,42545] } but i got {text_chunks_embeddings:{0:3498923,1:39048329,2:4535545 } }

please help me

yuye-aws · October 10, 2024, 7:28am

It is recommended to follow the official text chunking user guide and obtain the following text_chunk_embedding result.

"text_chunk_embedding": [
  {
    "knn": [ ... ]
  },
  {
    "knn": [ ... ]
  },
  {
    "knn": [ ... ]
  }
]

If your problem persists, can you share your pipeline configuration for text chunking, embedding and append processor?

yuye-aws · October 10, 2024, 7:30am

Are you using an append processor? If so, please share your append processor configuration.

Arun · October 10, 2024, 8:56am

PUT _ingest/pipeline/inside_foreach_chunk_embedding
{
“description”: "inside foreach chunk embedding " ,
“processors”: [

{
  "set": {
    "field": "text_chunks",
    "value": "{{{ _ingest._value }}}"
  }
},

{
  "set": {
    "field": "text_chunks_embeddings",
    "value": ""
  }
},

{
  "text_embedding": {
    "model_id": "bQ1J8ooBpBj3wT4HVUsb",
    "field_map": {
      "text_chunks": "text_chunks_embeddings"
    }
  }
},

{
  "append": {
    "field": "chunked_passage",
    "value": {
      "text_chunks" : "{{{ text_chunks }}}",
      "text_chunks_embeddings" : "{{{ text_chunks_embeddings }}}"
    }
  }
},

{
  "remove": {
    "field": "text_chunks"
  }
},


{
  "remove": {
    "field": "text_chunks_embeddings"
  }
}

]
}

yuye-aws · October 10, 2024, 9:00am

What is _ingest._value

Arun · October 10, 2024, 9:02am

its a sub pipeline processor for for_each processor like

{
“foreach”: {
“field”: “chunked_passage”,
“processor”: {
“pipeline”: {
“name”: “inside_foreach_chunk_embedding”
}
}
}
}

yuye-aws · October 12, 2024, 7:40am

I’ve checked your provided ingestion pipeline. Could you please provide a few sample documents for me to try it.

Btw, have you tried the official text chunking documentation?

Topic		Replies	Views
Ingest Pipeline Nested Array Vector Embeddings (AWS Comprehend Medical) Machine Learning troubleshoot , configure , index-management	2	153	December 22, 2024
How can I chunk PDF with ingest attachment and text chunking processor OpenSearch discuss	7	213	January 6, 2025
Need Neural Search Plugin to support Nested Field Type (Array of objects) OpenSearch discuss , feature-request	4	1185	December 6, 2023
Opensearch 2.13 Text chunking test error OpenSearch troubleshoot	3	65	September 4, 2024
Regarding storing vectors k-NN troubleshoot	3	317	February 6, 2024

Append processor for vector field

Related topics