Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch v2.12
CentOS 7
Describe the issue:
I’m looking at using the Predict API so that I can build my spare embedded vectors so that I can cache them outside of OpenSearch.
However, I’ve noticed that using the Predict API to generate the vectors appears to be much slower than when I just index the same content using a pipeline.
For example, if I use the Predict API to generate the vector embeddings for a number of documents, using something like the following, it’s about 1.5x slower than if I just use a pipeline to generate the embeddings:
POST /_plugins/_ml/_predict/sparse_encoding/MY_MODEL_ID_HERE
{"text_docs":["doc 1 text", "doc 2 text", "etc, etc, etc"]}
Relevant Logs or Screenshots:
Here’s what my internal logs show of using the Predict API to generate the embeddings:
Starting the index (500 documents)...
Processed Chunk #1 (100 of 500)... [@ 3m 3s]
Processed Chunk #2 (200 of 500)... [@ 6m 52s]
Processed Chunk #3 (300 of 500)... [@ 10m 0s]
Processed Chunk #4 (400 of 500)... [@ 12m 52s]
Processed Chunk #5 (500 of 500)... [@ 15m 49s]
Refreshing index...
Finished indexing in 15m 50s!
While if I have the pipeline handle the embeddings, this is what I see:
Starting the index (500 documents)...
Processed Chunk #1 (100 of 500)... [@ 1m 14s]
Processed Chunk #2 (200 of 500)... [@ 2m 32s]
Processed Chunk #3 (300 of 500)... [@ 3m 36s]
Processed Chunk #4 (400 of 500)... [@ 4m 42s]
Processed Chunk #5 (500 of 500)... [@ 5m 57s]
Refreshing index...
Finished indexing in 5m 57s!
I’ve tried using the Predict API on a single document or processing multiple documents in bulk. It appears to be more efficient to send the documents to the Predict API in bulk, but that’s still much slower than using a pipeline.
Is this extra overhead because of the resources needed to return the embeddings?
Is there a difference in the way that the pipeline method actually generates the embeddings that makes things faster?