Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.8/2.9 (Neural Search Plugin 2.8/2.9)
Server OS: Linux
Describe the issue:
Need Neural Search Plugin to support generating vector values for nested field type. (When we tried out Neural Search Plugin, we found that it doesn’t support this feature. )
In our case, we split a large document in multiple chunks and saved them in an array as elements of a nested field type. We expect Neural Search Plugin to generate vector values for each element of the nested field type and save generated vector values along with the original elements.
Neural Search Plugin currently seems only support generate vector values for a “STRING” array and have generated vector values saved in different array.
Configuration:
In the following example, “question_embeddings” is a nested field type with two fields. “text” and “text_embedding”. “text” is the field that contains the value to be vectorized.
“text_embedding” will contain the vector value for the value in the “text” field.
; In “demo_pipeline”, we have “text” mapped to “text_embedding” in the “field_map”.
PUT ingest/pipeline/demo_pipeline
{
“description”: “ML search test pipeline”,
“processors” : [
{
“text_embedding”: {
“model_id”: “8-S0XYsBFQTip4T18k-U”,
“field_map”: {
“question_embeddings” :
{“text”: “text_embedding”}
}
}
}
]
}
; Following demo_index uses demo_pipeline.
PUT PUT demo_index
{
“settings”: {
“number_of_replicas”: “0”,
“index.knn”: true,
“default_pipeline”: “demo_pipeline”
},
“mappings”: {
“properties”: {
“id”: {
“type”: “keyword”
},
"question_embeddings": {
"type" : "nested",
"properties": {
"text": {
"type": "text"
},
"text_embedding": {
"type": "knn_vector",
"dimension": 384,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "nmslib"
}
}
}
}
}
}
}
Data to be ingested:
POST demo_index/_doc -d ‘{“id” : “1”, “question_embeddings” : [{“text” : “Eric is an engineer”},{“text” : “and he uses OpenSearch”}]}’
Expected results:
Following document should be ingested and vector value will be generated in the embedding field.
{“1”, “question_embeddings”, [{“text” : “Eric is an engineer”, “text_embedding” : “<Neural Search Plugin generated vector value for the text field.>“}, {“text” : “and he uses OpenSearch”, “text_embedding”, “<Neural Search Plugin generated vector value for the text field>”]
Relevant Logs or Screenshots:
Following is the error from the “ingest” command:
{“error”:{“root_cause”:[{“type”:“class_cast_exception”,“reason”:“class java.util.ArrayList cannot be cast to class java.util.Map (java.util.ArrayList and java.util.Map are in module java.base of loader ‘bootstrap’)”}],“type”:“class_cast_exception”,“reason”:“class java.util.ArrayList cannot be cast to class java.util.Map (java.util.ArrayList and java.util.Map are in module java.base of loader ‘bootstrap’)”},“status”:500}