Arun
1
hai guys .i am new to open search.i am currently developing rag .
i just created ingest pipeline
- base 64 → words (attachment plugin)
- words → chunking (chunking processor)
- chunking → create embeddings for each chunks(foreach processor)
in the last step…i added another pipeline for create embddings for each chunk
in new pipeline
1 . each chunks → embedings(text embeddings )
2 . embedding → append into array[object] ( append ) like bellow
in the object
[
{
"text_chunks: “”,
“text_chunks_embeddings” : “”
}
]
appending is working fine…but appended vectors data type is wrong…
i expected { text_chunks_embeddings:[45345345,4234234,324324,42545] } but i got {text_chunks_embeddings:{0:3498923,1:39048329,2:4535545 } }
please help me
It is recommended to follow the official text chunking user guide and obtain the following text_chunk_embedding
result.
"text_chunk_embedding": [
{
"knn": [ ... ]
},
{
"knn": [ ... ]
},
{
"knn": [ ... ]
}
]
If your problem persists, can you share your pipeline configuration for text chunking, embedding and append processor?
1 Like
Are you using an append processor? If so, please share your append processor configuration.
1 Like
Arun
4
PUT _ingest/pipeline/inside_foreach_chunk_embedding
{
“description”: "inside foreach chunk embedding " ,
“processors”: [
{
"set": {
"field": "text_chunks",
"value": "{{{ _ingest._value }}}"
}
},
{
"set": {
"field": "text_chunks_embeddings",
"value": ""
}
},
{
"text_embedding": {
"model_id": "bQ1J8ooBpBj3wT4HVUsb",
"field_map": {
"text_chunks": "text_chunks_embeddings"
}
}
},
{
"append": {
"field": "chunked_passage",
"value": {
"text_chunks" : "{{{ text_chunks }}}",
"text_chunks_embeddings" : "{{{ text_chunks_embeddings }}}"
}
}
},
{
"remove": {
"field": "text_chunks"
}
},
{
"remove": {
"field": "text_chunks_embeddings"
}
}
]
}
1 Like
Arun
6
its a sub pipeline processor for for_each processor like
{
“foreach”: {
“field”: “chunked_passage”,
“processor”: {
“pipeline”: {
“name”: “inside_foreach_chunk_embedding”
}
}
}
}
1 Like
I’ve checked your provided ingestion pipeline. Could you please provide a few sample documents for me to try it.
Btw, have you tried the official text chunking documentation?
- Text chunking - OpenSearch Documentation
- Text chunking - OpenSearch Documentation