Performance for a high throughput OpenSearch cluster (200TB/day)

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Database: OpenSearch 2.4
UI: OpenDashboards 2.4
OS: Ubuntu 18.04

Describe the issue:
We maintain a logging platform which ingests and stores about 200TB/day worth of logs. At this scale, we are beginning to face issues with performance whilst querying the cluster. I was hoping to start a discussion and get advice around how large clusters can be optimized.

Optimization probably can be done in 2 segments:

  1. Optimizing the index structures, sharding strategies, resource management, etc.
  2. Setting user behavior validation on the UI.

These are the characteristics of the cluster that should be pertinent to the discussion, but do let me know if more information is required.

  • Ingestion - Our system is mostly write-heavy.
  • Fields - Our logs usually ingest messages with the field service as an identifier for the service sending the log. The volume of logs coming from each service is not uniform, several services send a disproportionally large number of logs.
  • Indexing Structure - We send 90+% of logs to a central monolith index. The indexes are rolled over daily.
  • Sharding - We rely on the default strategy of using _id. This means most log messages are spread across many nodes, and one query may potentially reach multiple nodes at once.
  • Replication - We replicate each shard once.
  • Retention - Due to cost constraints, the monolith index can hold up to 6 days worth of logs.
  • UI - Our user behavior is typically uncontrolled. Most usage patterns involve the user simply hitting on “get me logs from the last 1h” without a DSL query.

Given the above, we’d like to understand how to tune the cluster to improve retention with the same amount of resources, and improve query performance.

  • Does moving from a monolith index to a separate index usually give query and write performance benefits? I assume less data has to be scanned during a query for a dataset to be returned.
  • Since we are using service value as the key for each log, would setting a sharding strategy to shard based off service be possible? We understand this might cause several write-hotspots especially for high throughput services, but would the tradeoff be worthwhile? Also, this would cause reads to potentially only touch a minimal number of shards.
  • Is there a way to ensure a compulsory set of filters are always included before the user sends a query? We believe that enforcing the service filter would yield immediate performance benefits, since it is rather prone to usage abuse in its current state.

Hey @ta2023

I had a similer issue but wasnt ingesting 200TB a day for us it was around 500GB a day. Using two Opensearch Servers in the beginning.

Some example what we did were reducing the amount of data/Logs sent from the source. It took some time but only sent what we needed to create/modify dashboards, alerts, etc…

Using different index set for different device also helped. Such as Firewalls/ Windows servers, Switches, Linux server all get routed to different index sets. Think you get the point.

Last was shard/s using this documentation to adjust what we had.

As time went on we had to increase the number of Opensearch servers this helped.

Here is some doc’s we have used to help.

Hope that helps

I think that rolling over by size (e.g. 10GB per shard) rather than by day will give you more consistent performance.

  • Does moving from a monolith index to a separate index usually give query and write performance benefits? I assume less data has to be scanned during a query for a dataset to be returned.

If most searches happen in slices of this monolith (e.g. by service, host, etc), then breaking down the monolith in these slices will help with query performance. Not so much because it scans data (it looks for terms in the index), but because you’ll have one less condition in the query, so OpenSearch will have to intersect fewer document lists.

Otherwise, it shouldn’t help with write performance.

  • Since we are using service value as the key for each log, would setting a sharding strategy to shard based off service be possible? We understand this might cause several write-hotspots especially for high throughput services, but would the tradeoff be worthwhile? Also, this would cause reads to potentially only touch a minimal number of shards.

If your load is write-heavy, creating hotspots for writes sounds like a bad idea to me. But if it makes sense (query-wise) to break down data with one index per service (like in your previous question), that should help.

  • Is there a way to ensure a compulsory set of filters are always included before the user sends a query? We believe that enforcing the service filter would yield immediate performance benefits, since it is rather prone to usage abuse in its’ current state.

I don’t know of the UI side of this, but an alias can include a filter, so if people can use aliases, that would help. Or maybe you can pre-create dashboards for each service (especially if you have indices per service) pointing to their respective index patterns and encourage users to use them.

For more tips and tricks, here’s an oldie but goldie: Elasticsearch for logs and metrics: A deep dive – Velocity 2016, O’REILLY CONFERENCES - YouTube

One thing that I would add to help with writes is async translog: Translog | Elasticsearch Guide [7.10] | Elastic

2 Likes

If most searches happen in slices of this monolith (e.g. by service, host, etc), then breaking down the monolith in these slices will help with query performance. Not so much because it scans data (it looks for terms in the index), but because you’ll have one less condition in the query

For a monolith index, filter query would still have to be broadcast to nodes hosting the index shards and the number of shards that have to be queried would be larger compared to using a dedicated index (per service for instance). Would the number of shards queried play a sizeable role in query performance?

an alias can include a filter

Assuming this is used to enforce required filter fields e.g. service. Would this result in the data scanned from stored_fields after the matching doc ids are determined be roughly the same between the monolith and dedicated index?

Would the number of shards queried play a sizeable role in query performance?

In general, yes: the more shards you query, the more chances that one shard (or the node hosting it) will be slow, which in turn will slow down the response.

Assuming this is used to enforce required filter fields e.g. service. Would this result in the data scanned from stored_fields after the matching doc ids are determined be roughly the same between the monolith and dedicated index?

I’m not sure I understand the question, but in general, data isn’t scanned from stored fields, just the index is used when querying. The only exception is when you use stored fields (or, more frequently, _source) in scripts.

Thanks for the explanation and yes, we’re using _source field for displaying all the fields in the doc (log payload).

With regards to the filtered alias wondering if there are any issues with getting that working with Opensearch 2.4.0 as the result does not seem filtered and have verified the filtered query is working.

I find it strange that it doesn’t work. Did the example provided here not work for you? Index aliases - OpenSearch documentation

This behavior was there since forever, but maybe it broke at some point. Can you paste your command?