Built-in model inference costs for high-volume embedding generation - need clarification

abhi3 · October 1, 2025, 4:01pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): OpenSearch Service (AWS managed), planning latest available version

Describe the issue: Need clarification on compute costs and resource consumption when using built-in models for high-volume embedding generation. Planning to ingest ~300,000 documents 4x daily (1.2M embeddings/day) using text_embedding processor with built-in models like huggingface/sentence-transformers/all-MiniLM-L6-v2.

Specific questions:

Does built-in model inference consume significant CPU/memory beyond standard indexing?
Are there any per-embedding charges or only standard instance/storage costs?
At 1.2M daily embeddings, should I provision larger instances specifically for inference workload?
Will embedding generation create bottlenecks requiring dedicated ML nodes?

The documentation mentions “reducing model inference costs” but unclear if this applies only to external APIs or also built-in models.

Configuration:

Planned setup:
- AWS OpenSearch Service managed cluster
- m6g instance family (size TBD based on inference overhead)
- Ingest pipeline with text_embedding processor
- Built-in sentence transformer models
- Auto-embedding via default_pipeline setting

Relevant Logs or Screenshots: N/A - planning phase, seeking cost/performance guidance before implementation.

pablo · October 20, 2025, 12:30am

@abhi3 I think this question should be directed to AWS support.
The cost will depend on the size of the cluster, type of underlying nodes, AZ and many more factors.

This regards OpenSearch and supported pretrained model but, as you’ve already mentioned, it is AWS managed OpenSearch service.

Topic		Replies	Views
[Feedback] Machine Learning Model Serving Framework - Experimental Release General Feedback releases	48	3339	July 12, 2023
[Feedback] Neural Search plugin - experimental release General Feedback releases	42	3905	July 18, 2023
Best approach to generate embeddings for 10K+ documents in Spring Boot + OpenSearch (performance issue) OpenSearch Client Libraries opensearch-java	1	19	May 29, 2026
What does a model have to return from sagemaker for opensearch to be able to use it? Machine Learning	13	856	July 12, 2024
Error on connecting external connector and embedding model Machine Learning	7	152	January 17, 2026

Built-in model inference costs for high-volume embedding generation - need clarification

Related topics