Hi @grunggy, good question, and you’ve already correctly rejected Option 3 (parent/child join). Having tested this on a live cluster, see my finding below. If anyone else would like to weight in, would be great.
Short answer: Use nested documents. For OpenSearch 3.1+ the semantic field type implements this for you automatically.
Your concern about “parent field rewrites” is understandable but ultimately not a reason to choose flat. The flat approach comes with other caviats too, see below:
Problem 1: Duplicate parent hits
A document with multiple chunks can dominate your results. Running the same query against a flat index vs a nested index on identical data:
Flat index result for query “filtering production vector search”
Doc 4_chunk_1 | parent=4 | score=0.1330 # same parent
Doc 3_chunk_0 | parent=3 | score=0.1283
Doc 4_chunk_0 | parent=4 | score=0.1230 # same parent AGAIN
Doc 1_chunk_0 | parent=1 | score=0.1218
Doc 2_chunk_0 | parent=2 | score=0.1189
Nested index result, same query, same data
Doc 4 | Complete Guide to Vector Search | score=0.1330 # once, best chunk wins
Doc 3 | Comparison of ANN Algorithms | score=0.1283
Doc 1 | Introduction to Vector Databases| score=0.1218
Doc 2 | HNSW Algorithm Deep Dive | score=0.1189
At scale, a long document split into 20 chunks can consume your entire top-10. Every application that uses flat chunks has to deduplicate results before passing context to the LLM which is not a trivial logic (do you pick the highest-scoring chunk per parent? merge chunks? re-rank after dedup?).
Problem 2: Orphaned chunks on delete
When you delete a document from a nested index, all its chunks are gone atomically, one operation. With flat chunks:
Delete parent doc from flat index, chunks are NOT deleted
DELETE /my-index/_doc/parent-42
These chunk docs still exist and are still being returned in searches:
GET /my-index/_search
{"query": {"term": {"parent_id": "42"}}}
# returns 8 chunk docs that no longer have a parent
You must remember to run this yourself, transactionally, every time:
POST /my-index/_delete_by_query
{"query": {"term": {"parent_id": "42"}}}
In practice this means every delete in your application is two operations with no atomicity guarantee between them.
Your concern about metadata update overhead is addressed more practically by skip_existing_embedding: true on the semantic field, it detects whether the source text changed and skips the ML inference call if not. The write cost of the document itself is the same either way; the expensive part is the embedding model call, and that’s handled.
Example:
Step 1: Register and deploy your embedding model
POST /_plugins/_ml/model_groups/_register
{"name": "rag-models", "description": "Models for RAG pipeline"}
POST /_plugins/_ml/models/_register
{
"name": "my-embedding-model",
"version": "1.0.0",
"model_format": "TORCH_SCRIPT",
"function_name": "TEXT_EMBEDDING",
"model_group_id": "<model_group_id>",
"model_content_hash_value": "<sha256-of-zip>",
"model_config": {
"model_type": "bert",
"embedding_dimension": 768,
"framework_type": "sentence_transformers",
"additional_config": {
"space_type": "l2"
}
},
"url": "<model-zip-url>"
}
POST /_plugins/_ml/models/<model_id>/_deploy
Step 2: Create the index
PUT /my-rag-index
{
"settings": {"index.knn": true},
"mappings": {
"properties": {
"title": {"type": "text"},
"body": {
"type": "semantic",
"model_id": "<model_id>",
"skip_existing_embedding": true,
"chunking": [
{
"algorithm": "fixed_token_length",
"parameters": {
"token_limit": 300,
"overlap_rate": 0.1,
"tokenizer": "standard"
}
}
]
}
}
}
}
Step 3: Verify the auto-generated mapping
GET /my-rag-index/_mapping
You will see that body expanded into body_semantic_info with this structure, this is what the engine considers the correct production layout:
"body_semantic_info": {
"properties": {
"chunks": {
"type": "nested",
"properties": {
"text": {"type": "text"},
"embedding": {"type": "knn_vector", "dimension": 768, ...}
}
},
"model": {
"properties": {
"id": {"type": "text", "index": false},
"name": {"type": "text", "index": false},
"type": {"type": "text", "index": false}
}
}
}
}
Step 4: Index documents (no pipeline setup needed)
PUT /my-rag-index/_doc/1
{
"title": "My Document Title",
"body": "Your full document text here. The semantic field handles chunking and embedding automatically during ingest."
}
Step 5: Search
For RAG, hybrid search (BM25 + neural) consistently outperforms either alone. First create a normalization pipeline:
PUT /_search/pipeline/hybrid-rag-pipeline
{
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {"technique": "min_max"},
"combination": {
"technique": "arithmetic_mean",
"parameters": {"weights": [0.3, 0.7]}
}
}
}
]
}
Then query, note you target the semantic field directly, not the internal path:
GET /my-rag-index/_search?search_pipeline=hybrid-rag-pipeline
{
"query": {
"hybrid": {
"queries": [
{
"match": {
"body": {"query": "your query text here"}
// works because semantic field defaults raw_field_type to "text"
}
},
{
"neural": {
"body": {
"query_text": "your query text here",
"model_id": "<model_id>",
"k": 5
}
}
}
]
}
},
"_source": ["title", "body"]
}
Why hybrid outperforms pure neural for RAG, same query, three approaches on the same 4-document dataset:
Query: “ACORN filtering production vector search”
The corpus had one document that explicitly covered ACORN and production filtering,
and one that covered general vector database concepts with no mention of filtering.
Pure BM25: Filtering doc ranked 1st, exact term match on “ACORN”, “filtering”, “production”
Pure neural: General vectors doc ranked 1st, small model failed to discriminate semantically
Hybrid: Filtering doc ranked 1st, BM25 term signal rescued the neural ranking failure,
with a much wider margin between 1st and 2nd (0.79 vs 0.70 after min-max normalisation)
vs pure neural where scores were nearly indistinguishable (0.1294 vs 0.1368)
*BM25 and neural complement each other’s failure modes, BM25 handles exact terminology, neural handles paraphrase and synonym matching. The scores above are illustrative, with a real production model the neural component would carry more weight.