Built-in model inference costs for high-volume embedding generation - need clarification

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): OpenSearch Service (AWS managed), planning latest available version

Describe the issue: Need clarification on compute costs and resource consumption when using built-in models for high-volume embedding generation. Planning to ingest ~300,000 documents 4x daily (1.2M embeddings/day) using text_embedding processor with built-in models like huggingface/sentence-transformers/all-MiniLM-L6-v2.

Specific questions:

  1. Does built-in model inference consume significant CPU/memory beyond standard indexing?

  2. Are there any per-embedding charges or only standard instance/storage costs?

  3. At 1.2M daily embeddings, should I provision larger instances specifically for inference workload?

  4. Will embedding generation create bottlenecks requiring dedicated ML nodes?

The documentation mentions “reducing model inference costs” but unclear if this applies only to external APIs or also built-in models.

Configuration:

Planned setup:
- AWS OpenSearch Service managed cluster
- m6g instance family (size TBD based on inference overhead)
- Ingest pipeline with text_embedding processor
- Built-in sentence transformer models
- Auto-embedding via default_pipeline setting

Relevant Logs or Screenshots: N/A - planning phase, seeking cost/performance guidance before implementation.

1 Like

@abhi3 I think this question should be directed to AWS support.
The cost will depend on the size of the cluster, type of underlying nodes, AZ and many more factors.

This regards OpenSearch and supported pretrained model but, as you’ve already mentioned, it is AWS managed OpenSearch service.

1 Like