OpenSearch Managed Cluster version 2.15
I have set up a ingestion pipeline as follows:
version: '2'
log-pipeline:
source:
s3:
codec:
parquet:
compression: none
aws:
region: us-east-1
sts_role_arn: '<ARN>'
acknowledgments: true
scan:
buckets:
- bucket:
name: test
filter:
include_prefix:
- embeddings/
delete_s3_objects_on_read: false
processor:
- date:
destination: 'ingested_at'
from_time_received: true
sink:
- opensearch:
hosts: [<HOST>]
index: 'test'
aws:
sts_role_arn: '<ARN>'
region: us-east-1
dlq:
s3:
bucket: test-dlq
region: us-east-1
sts_role_arn: '<ARN>'
An example of the Polars dataframe saved to parquet is as follows:
timestamp type vector
i64 str list[f64]
1727076649 a [0.042296, 0.047431, … -0.010195]
1727093762 b [0.0, 0.0, … 0.0]
1727052674 a [-0.062857, -0.040043, … -0.039441]
My index template is as follows:
{
"index_patterns": [
"test*"
],
"template": {
"settings": {
"index.knn": true,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"timestamp": {
"type": "integer"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"vector": {
"type": "knn_vector",
"dimension": 300,
"method": {
"engine": "nmslib",
"space_type": "l2",
"name": "hnsw",
"parameters": {}
}
}
}
}
}
}
In the logs, I’m getting the following errors:
2024-11-13T19:31:45.154 [log-pipeline-sink-worker-2-thread-2] WARN org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy - index = test, operation = Index, status = 400, error = failed to parse field [vector] of type [knn_vector] in document with id 'tGwCJ5MBaH-WR-jbpXBx'. Preview of field's value: '{element=-0.109751857817173}'
2024-11-13T19:31:45.154 [log-pipeline-sink-worker-2-thread-2] WARN org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy - index = test, operation = Index, status = 400, error = failed to parse field [vector] of type [knn_vector] in document with id 'smwCJ5MBaH-WR-jbpXBx'. Preview of field's value: '{element=-0.06285733729600906}'
2024-11-13T19:31:45.154 [log-pipeline-sink-worker-2-thread-2] WARN org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy - index = test, operation = Index, status = 400, error = failed to parse field [vector] of type [knn_vector] in document with id 'sWwCJ5MBaH-WR-jbpXBx'. Preview of field's value: '{element=0.0}'
I’m wondering if I have the wrong data type or set up either the pipeline or index template incorrectly to get this error?
Have anyone thoughts to my issues?
M.