Using Aliases - performance impact

Versions (relevant - OpenSearch):
2.9

Describe the question:
Hi Guys! I would like to ask a question regarding to index aliases. Imagine that we have monthly indexes (each ~20gb) and we would like to query the entire year to find a record with an ID (primary key). Currently we maintain a dedicated mapper index (ID - index name) which we query first by ID, get the index name and then query that index with the ID.
Seems we could use alias and query that without using the mapper index.

My question would be that how the query will be executed in case of alias search under the hood? Will all the 12 indexes be queried in parallel? Would it cause higher load on the cluster than our original approach? (because now we query only 2 indexes but with alias it can be 12 if there is no other logic on OS side)

I haven’t found any answers for this so that’s why it would be good to know about the underlying query execution.

Thanks!

The default query type is QUERY_THEN_FETCH, if alias is used, when a coordinate node receives the search request, it will send the request to all the shards in the 12 indexes firstly, the request in each shard will be executed in parallel, and then the coordinate node gathers the result from each shard, sort the result and then fetch the document by the document ID. So search too many shards results in resource overhead and higher query latency. Even though OpenSearch have pre-filter shards strategy which pre-filter the shards that do not match any documents under some condition, but the coordinate node still send request to each shard to check which shard needs to be filtered, the latency improvement is not much.

1 Like

Thank you very much! This info was very useful, highly appreciated!