Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): OpenSearch Service (AWS managed), planning latest available version
Describe the issue: Need clarification on compute costs and resource consumption when using built-in models for high-volume embedding generation. Planning to ingest ~300,000 documents 4x daily (1.2M embeddings/day) using text_embedding processor with built-in models like huggingface/sentence-transformers/all-MiniLM-L6-v2.
Specific questions:
-
Does built-in model inference consume significant CPU/memory beyond standard indexing?
-
Are there any per-embedding charges or only standard instance/storage costs?
-
At 1.2M daily embeddings, should I provision larger instances specifically for inference workload?
-
Will embedding generation create bottlenecks requiring dedicated ML nodes?
The documentation mentions “reducing model inference costs” but unclear if this applies only to external APIs or also built-in models.
Configuration:
Planned setup:
- AWS OpenSearch Service managed cluster
- m6g instance family (size TBD based on inference overhead)
- Ingest pipeline with text_embedding processor
- Built-in sentence transformer models
- Auto-embedding via default_pipeline setting
Relevant Logs or Screenshots: N/A - planning phase, seeking cost/performance guidance before implementation.