Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): Latest, running in docker
Describe the issue:
I am trying to follow the example to create a RAG chatbot with conversational flow agent in dev tools. Have following model
POST /_plugins/_ml/models/_register
{
“name”: “huggingface/sentence-transformers/all-MiniLM-L12-v2”,
“version”: “1.0.2”,
“model_format”: “TORCH_SCRIPT”
}
With following pipeline
PUT /_ingest/pipeline/test_population_data_pipeline
{
“description”: “text embedding pipeline”,
“processors”: [
{
“html_strip”: {
“field”: “SubmissionStoryContent”,
“target_field”: “SubmissionStoryContent_clean”
}
},
{
“text_chunking”: {
“algorithm”: {
“fixed_token_length”: {
“token_limit”: 100,
“overlap_rate”: 0.2,
“tokenizer”: “standard”
}
},
“field_map”: {
“SubmissionSubject”: “SubmissionSubject_chunk”,
“SubmissionStoryContent_clean”: “SubmissionStoryContent_chunk”
}
},
“text_embedding”: {
“model_id”: “9W7z4pgBK5X7Z9B4zPri”,
“field_map”: {
“SubmissionSubject_chunk”: “SubmissionSubject_embedding”,
“SubmissionStoryContent_chunk”: “SubmissionStoryContent_embedding”
}
}
}
]
}
Basically I have 2 fields, SubmissionSubject and SubmissionStoryContent. I strip html from SubmissionStoryContent –> SubmissionStoryContent_clean and then use chucking processor to create chunks for both fields. . Then vectorize using earlier deployed model. When a simulate the pipeline all looks good.
Following is the index I want to use.
PUT test_population_data1
{
“settings”: {
“default_pipeline”: “test_population_data_pipeline”,
“index.knn”: true
},
“mappings”: {
“properties”: {
“SubmissionSubject”: {
“type”: “text”
},
“SubmissionSubject_embedding”: {
“type”: “knn_vector”,
“dimension”: 384,
“method”: {
“engine”: “lucene”,
“space_type”: “l2”,
“name”: “hnsw”,
“parameters”: {}
}
},
“SubmissionStoryContent”: {
“type”: “text”
},
“SubmissionStoryContent_embedding”: {
“type”: “knn_vector”,
“dimension”: 384,
“method”: {
“engine”: “lucene”,
“space_type”: “l2”,
“name”: “hnsw”,
“parameters”: {}
}
}
}
}
}
However when I try to send a document to index I receive following error
…..“index”: {
“_index”: “test_population_data1”,
“_id”: “JG6Z45gBK5X7Z9B4lPtk”,
“status”: 400,
“error”: {
“type”: “mapper_parsing_exception”,
“reason”: “failed to parse field [SubmissionSubject_embedding] of type [knn_vector] in document with id ‘JG6Z45gBK5X7Z9B4lPtk’. Preview of field’s value: '{knn=[-0.08113378, 0.0…………….caused_by”: {
“type”: “json_parse_exception”,
“reason”: “”"Current token (START_OBJECT) not numeric, can not use numeric value accessors
at [Source: REDACTED (StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION disabled); line: 1, column: 248]
Could not figure out where is the issue. Any ideas will be appreciated.
Configuration:
Relevant Logs or Screenshots: