Taking too much time in Data Ingestion in Vector Index

hemendra_05 · January 11, 2024, 7:52am

Versions (relevant - OpenSearch):
2.11

Issue:
We are using m5.large.search single node cluster where we are using normal keyword searches for our application, but now we want to use vector search to improve the efficiency, and for this I have created a ingestion pipeline for the fields that I want to vectorise, we have around 8 fields to be vectorise and when I try to ingest data in this index, it taking too much time, normally when we ingest data in the normal index it takes around 20-40 seconds to send around 1300-1400 documents, but when I tried to ingest the same data in the vector index, its taking too much time, around 15-20 mins

Can anyone suggest what is it causing me, is it due to embeddings or something else I need to think of. And if the issue in the strategy to creating the vectors then please let me know.

Configuration:
So basically our use case is, we want to provide a facility to our users where the can query in a prompt format and based on that we will return the data to them related to their query. And for this what we are doin is, we transforming the desired fields into vectors and then searching the prompt in those vectors, is it right approach to do so, please let me know as Im very new to this.

Thanks a lot in Advance!

martin.g · February 20, 2024, 10:00pm

Hi Hemendra,

What type of model do you use for data ingestion - is it local or remotely hosted?

system · April 20, 2024, 10:00pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to separate ml ingestion workload from search workload Machine Learning	2	222	January 17, 2024
Predict API slower than index pipeline Machine Learning	8	316	May 19, 2024
Opensearch ingestion is slow and timeouts are occuring very frequently OpenSearch	11	254	January 20, 2025
Cluster becomes too slow when indexing knn_vector data k-NN troubleshoot	1	695	January 4, 2023
Log message that fails index due to missing data stream timestamp OpenSearch	4	143	October 6, 2024

Taking too much time in Data Ingestion in Vector Index

Related topics