kris
March 18, 2022, 6:36pm
#1
The main goal with this is to be informational as we’re in the process of automating performance tests for OpenSearch. This doc serves as a starting point for figuring out how to interpret that data.
opened 05:50PM - 14 Mar 22 UTC
enhancement
benchmarking
## Background
Using OpenSearch 1.2 build 762 ([arm64](https://ci.opensearch.o… rg/ci/dbc/distribution-build-opensearch/1.2.4/762/linux/arm64/dist/opensearch/manifest.yml
)/[x64](https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/1.2.4/762/linux/x64/dist/opensearch/manifest.yml)) I ran a set of ~20 performance tests for each of the following single node configurations:
1. m5.xlarge - Security enabled
2. m5.xlarge - Security disabled
3. m6g.xlarge - Security enabled
4. m6g.xlarge - Security disabled
All tests were running using [OpenSearch Benchmark](https://github.com/opensearch-project/opensearch-benchmark) with an i3.8xlarge EC2 instance as the load generation host. The tests used a [modified version of the default schedule for the nyc_taxis workload](https://github.com/opensearch-project/OpenSearch/files/8247317/nyc_taxis2warmup3test.txt) which runs the original schedule twice with all operations in warmup mode and three times as the standard schedule, commonly known as two warmup and three test iterations. Additional aggregations were run on the results of each test to average together metrics across different query types in order to create a set of query summary metrics.
A new load generator and new OpenSearch single node cluster were provisioned for each test.
## Findings
Some random variation between tests is expected. For indexing throughput the standard deviation as a percentage of the mean of any percentile statistic, excluding p100, is about 5% across all configurations. For query latency this is about 10%.
Average latency for all queries in a workload can vary by 20% or more between any given test. Why this is will require more research. In the meantime we should avoid outright comparisons of one test to another.
Included below are some approximate statistics for index and query metrics for each configuration. This includes the expected (average) value, the standard deviation as a percentage of the mean and the percent difference between the min and max. This table is meant to give people a framework for understanding their performance test results and should not necessarily be taken as a ground truth.
|Instance Type |Security |Expected Indexing Throughput Avg (req/s) |Indexing StDev% Mean |Indexing MinMax% Diff |Expected Query Latency p90 (ms) |Expected Query Latency p99 (ms) |Query StDev% Mean |Query MinMax% Diff |
|--- |--- |--- |--- |--- |--- |--- |--- |--- |
|m5.xlarge |Enabled |30554 |~5% |~12% |431 |449 |~10% |~23% |
|m5.xlarge |Disabled |34472 |~5% |~15% |418 |444 |~10% |~25% |
|m6g.xlarge |Enabled |38625 |~3% |~8% |497 |512 |~8% |~23 |
|m6g.xlarge |Disabled |45447 |~2% |~3% |470 |480 |~5% |~15% |
[Raw Data](https://github.com/opensearch-project/OpenSearch/files/8246226/opensearchPerfData.md)
kris
March 18, 2022, 6:38pm
#2
We welcome feedback / questions as well