Hello Everyone ,
I’m looking for best practices from people who have deployed OpenSearch in a similar way because I’m employed on an endeavour that involves doing so on a wide scale.
We intend to use OpenSearch for its the full text search, logging analytics, and monitoring features because our system generates a large volume of data.
The following details pertain to our surroundings:
- Data Volume: Every day, we hope to index many terabytes of data.
- Cluster Size: A 20-node cluster will be launched initially, and it will be scaled up as necessary.
- Data Retention: Since we must keep data for a minimum of a year, storage and performance issues arise.
- Query Load: Thousands of inquiries with varied degrees of complexity should be handled by our system each second.
I have some inquiries in light of these requirements:
Cluster Configuration: How should a large OpenSearch clusters be configured? Do we need to be aware of any particular settings or optimisations to guarantee stability and performance?
Indexing Strategy: How can a large amount of data be indexed? Are there best practices for throughput in indexing, replication, and shard allocation?
Data Retention & Management: Which techniques do you suggest using to manage data retention for extended periods of time without adversely affecting performance? Exist effective methods for archiving older data?
Monitoring and Maintenance: Which methods and instruments are most appropriate for keeping an eye on the functionality and well-being of an OpenSearch cluster this size? How can we handle node failures and shard rebalancing among other maintenance tasks?
I followed this https://opensearch.org/docs/latest/search-plugins/knn/performance-tuning/minitab
Any knowledge, firsthand accounts, or helpful links you could provide would be highly valued. Our goal is to develop a stable and effective OpenSearch implementation, and we would be delighted to absorb knowledge from the community.
Thank you in advance.