**NOTE:** Development of the proposed Framework will be pursued in another repo;… posting the proposal in this one for visibility
## Objective
The purpose of this doc is to outline key user experiences for the Upgrade Testing Framework so the design is driven from a customer-first perspective. The designs represented in the doc are speculative and open to change; think of them as proposals rather than assertions.
## Terminology
* **Application Developer:** individual that uses OpenSearch as a search/logs engine in their system, but doesn't necessarily manage it
* **Cluster Admin:** individual responsible for the maintenance and/or support of one or more ElasticSearch/OpenSearch Clusters
* **Committed OpenSearch Developer:** individual who contributes code to the OpenSearch Project, either its core repositories or the constellation of repositories connected to them
* **Casual OpenSearch Developer:** individual who might be willing to contribute code to the OpenSearch Project, assuming that the process is easy
* **User:** Either a Cluster Admin or a Committed/Casual OpenSearch Developer; an Application Developer might wear any of those hats as well
* **Upgrade:** moving from one version of ElasticSearch/OpenSearch to another without necessarily changing the underlying hardware hosting the cluster
* **Migration:** moving an ElasticSearch/OpenSearch cluster from one underlying hosting solution to another without necessarily changing the version of the cluster
## Background
The Upgrade Testing Framework is a proposed project (https://github.com/opensearch-project/opensearch-migrations/issues/30) for the OpenSearch Project. It has the following goals:
* Provide OpenSearch Developers a way to easily test the outcomes of more complex, real-world upgrades than the existing backwards compatibility (BWC) tests allow and ideally replace those existing tests. It intends to address upgrades to the core engine and plugins, along with accompanying data, configuration and metadata, and handle N+M version upgrades instead of just N+1.
* Perform upgrades during testing in the same manner as Users execute them “in the wild”.
* Provide Cluster Admins mechanisms to predict the outcome of an upgrade before executing one and enable them to make a representative simulacra of their cluster on their laptop or other development machine in order to test out an upgrade in a safe environment.
* Promote and encapsulate validation mechanisms that Cluster Admins can use to determine whether a upgraded candidate cluster is ready for production traffic.
* Serve as a central repo to iteratively capture the community’s full understanding of differences between versions and continuously assert that each difference remains as expected. This will enable the community to formally understand, document, and ideally resolve these differences.
## Setup
The Upgrade Testing Framework will just be a Python project in GitHub. To set it up on a laptop, developer desktop, or CI/CD fleet, just clone it to the host and execute the command-line utility in the downloaded source. The Framework will use system-supplied copies of Python and Docker to perform its work.
## Use Cases
### Core Use Cases
These use-cases are core features that are necessary for any “initial release”.
* UC1 - I want to test an upgrade using default data/tests
* UC2 - I want to simulate an upgrade using custom data/tests
* UC3 - I want to know if an upgraded cluster is ready for production traffic
### Incremental Use Cases
These use-cases add value, but will be prioritized after the Core Use Cases and based on user feedback/business requirements.
* UC4 - I want to run a bunch of different tests against a bunch of different clusters
* UC5 - I want the framework to auto-generate test config based on my real cluster
* UC6 - I want a predictive assessment of the issues I’ll encounter when upgrading my real cluster
### UC1 - Use Case - I want to test an upgrade using default data/tests
#### Step 1: Define task configuration
Before executing a upgrade test run, the User will need to provide some configuration so the framework knows what to do. It is proposed that for the initial release, we do that via a configuration file. The Framework will ship with a library of `test_config.json` files and their associated Dockerfiles, data files, and test scripts that encompass the current understanding of the differences between versions
```
# test_config.json
{
"cluster_def": {
"start": {
"dockerfile": "./path/to/Dockerfile1",
"tags": [
"ElasticSearch_7_10_2"
]
},
"end": {
"dockerfile": "./path/to/Dockerfile2",
"tags": [
"OpenSearch_2_3"
]
},
"desired_nodes": 2
},
"upgrade_def": {
"style": "snapshot-restore"
},
"test_def": {
"default_tests": {
"dataset_file": "./data/los_datos_del_diablo.json",
"test_files_start": [
"./tests/default_tests_start.py",
],
"test_files_end": [
"./tests/default_tests_end.py",
]
},
"default_cluster_comparison_tests": {
"test_files_end": [
"./tests/default_cluster_comparison_tests.py",
]
}
}
}
```
A few notes:
* Users can easily supply whatever Dockerfiles they want to represent the nodes in their cluster
* We will likely want the cluster setup, such as number and type of nodes, to be independently configurable between the starting and ending clusters, but for the initial cut we’ll avoid that complexity
* The “tags” are probably unnecessary, but they are intended to highlight the fact we’ll want to down-select which expectations/data/tests to apply to a given cluster based on its configuration later on. We’ll likely solve this problem by directly interrogating the clusters after they are stood up.
* The “test_def” section is where the framework will learn about the data to load into the cluster and the tests to execute, and when to perform those actions. It seems like data must necessarily be paired with tests (what’s the point of adding data if you don’t have tests that use it?); tests do not necessarily require new data.
#### Step 2: Invoke the framework
The User then invokes the framework CLI to kick off the upgrade. The CLI gives the user feedback throughout the upgrade process:
```
> ./utf test --file test_config.json
[bootstrap] Building Dockerfile "./path/to/Dockerfile1"...
[bootstrap] Building Dockerfile "./path/to/Dockerfile2"...
[bootstrap] Checking for conflicting Docker resources from previous runs...
[bootstrap] Found Docker resources from previous run: 6 containers, 1 network
[bootstrap] Do you wish to remove the conflicting resources and continue? (y/n)
> y
[bootstrap] Removing conflicting resources...
[initialize_start] Spinning up (2) nodes for the starting cluster...
[initialize_start] Starting Node "start.1"
[initialize_start] Starting Node "start.2"
[initialize_start] Cluster stabilized, Node "start.2" elected master
[test_start] Beginning upload of test data...
[test_start] Upload of test index "los_datos_del_diablo" started
[test_start] Upload of test index "los_datos_del_diablo" completed
[test_start] Uploaded all test data
[test_start] Beginning analysis of cluster's starting state...
[test_start] Running tests "./tests/default_tests_start.py"...
[test_start] Cluster met all expectations for starting state
[execute_upgrade] Beginning upgrade of cluster...
[execute_upgrade] Upgrade style: Snapshot-Restore
[execute_upgrade] Performing snapshot on source cluster...
[execute_upgrade] Snapshot "test_snapshot_2022_11_21" stored at /home/blzbub/snapshots
[execute_upgrade] Starting Node "end.1"
[execute_upgrade] Starting Node "end.2"
[execute_upgrade] Target cluster stabilized, Node "end.1" elected master
[execute_upgrade] Restoring snapshot into target cluster...
[execute_upgrade] Snapshot restored
[execute_upgrade] Finished upgrade of cluster
[test_end] Beginning analysis of target cluster's ending state...
[test_end] Running tests "./tests/default_tests_end.py"...
[test_end] Running tests "./tests/default_cluster_comparison_tests.py"...
[test_end] Target cluster did NOT meet all expectations for ending state; see final report for details
==== FINAL RESULTS ====
Source cluster did meet all starting expections
Target cluster did NOT meet all ending expections, see final report
==== LOGS AND REPORTS ====
Final report: ./reports/report.2022_11_11_08_10_31.txt
Full Logs: ./logs/logs.2022_11_11_08_10_31.txt
State File: ./state_file
==== CLUSTER DETAILS ====
Source cluster is STILL RUNNING (127.0.0.1:9200)
Target cluster is STILL RUNNING (127.0.0.1:9201)
```
Some notes:
* The Cluster is left running at the end of the test run by default so that the user can poke at it if they want to, so this behavior should be controlled via command-line flags and the default can easily be flipped.
* The CLI should present only the most important, highest-level details to the user via stdout unless something goes wrong. The full logs of every under-the-hood operation are dumped to a log file the user can explore if interested or they have a need.
* The overall flow of the testing is broken up into stages marked with prefixed brackets (`[test_start]`). These should ideally be idempotent break-points where the application can be stopped/started. This is made possible by saving the application state to a state-file so that it can pick up where it stopped. This enables the User to do things like stop the application immediately before the upgrade begins and poke around the running containers for the starting cluster for debugging purposes. It’s pretty easy to set up and maintain this, and it adds substantial value in comparison to the additional complexity.
* It is expected that the “Test” steps (`[test_start]`, `[test_end]`) will be to invoke separate Python executables to run unit tests against the starting/ending cluster(s). Separating the test/analysis layer from the CLI orchestration layer makes it possible to do things like have users run the final validation steps against “real” clusters instead of just ones on their laptop, etc.
* The report will be something that outlines specifically all the tests performed and their results. Ideally, we’d present it in a user-friendly way like an HTML form, but that’s probably an optimization for later as long as we have something functional initially. However, the initial cut will need to be consumable by non-Developers, so we’ll want to do something more sophisticated than just dump raw unit test results.
#### Step 3: Review the Results
For the initial cut of the framework, the results will likely just be the output of a standard code testing framework (e.g. pytest). If this doesn’t have the clarity we desire, then we can assemble our own report w/ custom code.
In terms of what types of things the tests are expected to cover and the report is expected to surface, consider the following scenarios:
* Things we expect to work/be true (e.g. same number of documents before/after the upgrade)
* Updates in OpenSearch core or plugin code resulting in translations between the source and target versions (configs, data, metadata “morph” during the upgrade)
* Changes in compatibility of the system under test - features removed, changed, bugs introduced or fixed, etc
* Bugs in the validation framework’s tests themselves
### UC2 - Use Case - I want to test an upgrade using custom data/tests
#### Step 1: Update task configuration
Users can supply custom data (e.g. data that is unique to their setup and a part of the public repo) using the configuration file. Custom data necessarily implies custom tests, so the two will need to be configured together.
As long as the custom data and tests conform to the same interface as the “default” ones, the framework will ingest and execute them without distinction between what’s “custom” or not. The default data/tests serve as an example to base custom versions on, in addition to the documentation we write.
```
# test_config_custom.json
{
"cluster_def": {
"start": {
"dockerfile": "./path/to/Dockerfile1",
"tags": [
"ElasticSearch_7_10_2"
]
},
"end": {
"dockerfile": "./path/to/Dockerfile2",
"tags": [
"OpenSearch_2_3"
]
},
"desired_nodes": 2
},
"upgrade_def": {
"style": "snapshot-restore"
},
"test_def": {
"default_tests": {
"dataset_file": "./data/los_datos_del_diablo.json",
"test_files_start": [
"./tests/default_tests_start.py",
],
"test_files_end": [
"./tests/default_tests_end.py",
]
},
"default_cluster_comparison_tests": {
"test_files_end": [
"./tests/default_cluster_comparison_tests.py",
]
},
"custom_tests": {
"dataset_file": "./my_custom_data.json",
"test_files_start": [
"./my_custom_tests_1.py",
],
"test_files_end": [
"./my_custom_tests_2.py",
]
}
}
}
```
* An existing tool, Elasticdump ([see here](https://github.com/elasticsearch-dump/elasticsearch-dump)), provides a way to export indices from a cluster. It seems reasonable to either base our data format around its format or at the very least accept it.
#### Step 2: Invoke the framework
As normal, the User would just invoke the Framework with their configuration file and the Framework will do the rest:
```
> ./usf test --file test_config_custom.json
[bootstrap] Building Dockerfile "./path/to/Dockerfile1"...
.
. # Intermediate steps skipped for clarity
.
[test_start] Beginning upload of test data...
[test_start] Upload of test index "los_datos_del_diablo" started
[test_start] Upload of test index "los_datos_del_diablo" completed
[test_start] Upload of test index "my_custom_data"
[test_start] Upload of test index "my_custom_data" completed
[test_start] Uploaded all test data
[test_start] Beginning analysis of cluster's starting state...
[test_start] Running tests "./tests/default_tests_start.py"...
[test_start] Running tests "./my_custom_tests_1.py"
[test_start] Cluster met all expectations for starting state
[execute_upgrade] Beginning upgrade of cluster...
.
. # Intermediate steps skipped for clarity
.
[test_end] Beginning analysis of cluster's ending state...
[test_end] Running tests "./test/default_tests_end.py"
[test_end] Running tests "./test/default_cluster_comparison_tests.py"
[test_end] Running tests "./my_custom_tests_2.py"
[test_end] Cluster did NOT meet all expectations for ending state; see final report for details
.
. # And so on
.
```
### UC3 - Use Case - I want to know if an upgraded cluster is ready for production traffic
While the Upgrade Testing Framework, as a whole, is focused on providing Users a way to test the full upgrade process on their laptop, developer desktop, etc against a Dockerized test cluster, the validation tests themselves should ideally not care whether they are being pointed at a “test” cluster or a “real” cluster, and be equally applicable to either. For that reason, we will structure the validation tests as separate executable scripts that the Framework will invoke. This enables the same validation tests to be used by a Cluster Admin to help determine if an upgraded cluster is ready for production traffic.
Presumably, a Cluster Admin would not need to run the validation tests against an upgraded cluster that is currently serving production traffic, as they could use their existing alarms/metrics for that. This is an assumption though and open to re-evaluation.
#### Step 1: Duplicate production cluster
The Cluster Admin would create a duplicate of their production cluster, perhaps using cross-cluster replication.
#### Step 2: Perform manual upgrade of duplicate
The Cluster Admin would use the OpenSearch documentation to perform an upgrade of the duplicate cluster, perhaps using snapshot/restore.
#### Step 3: Execute the validation tests against the clusters
Behind the scenes of the [test_end] step, all the Framework is doing is making shell invocations of executable Python scripts that contain tests to be performed against a pair of (ip address, port) tuples. As long as there’s a network path to the source/target clusters, the tests should run the same.
```
> ./test/default_cluster_comparison_tests.py --source_cluster 10.0.0.2:9200 --target_cluster 10.0.1.2:9200 --auth root:i_am_g_root
Running tests against source (10.0.0.2:9200) and target (10.0.1.2:9200)
TEST 1: Same number of documents in index...
TEST 1: Passed
.
.
.
```
* This is obviously easiest for groups of tests that can directly compare the contents of two clusters without requiring a setup step of first uploading specific data that the tests are intrinsically tied to. As a result, there is a need to carefully separate groups of tests based on their setup requirements (e.g. avoid adverse “mingling”).
* However, in principal there’s no reason why user couldn’t upload test data to their duplicate (pre-upgrade) cluster, perform the upgrade on the duplicate, and then run the validation tests tied with the test data... but would they really want to? Currently unclear.
### UC4 - Use Case - I want to run a bunch of different tests against a bunch of different clusters
The above discussion has so far assumed that all the tests we’d want to run could be performed using a single cluster/upgrade process. However, it’s likely we’ll have sets of tests that we want to run that are mutually incompatible, or at the very least inconvenient to combine. The proposed Framework should be able to handle that simply, but to understand how we need to go into implementation details.
As has been presented thus far, the Framework is built around the idea of a top level run-loop (a Runner) which executes a sequence of steps (Framework Steps) in order. The Runner will be a self-contained object which handles its own setup/teardown, and accepts a list of Framework Steps to perform as well as a test_config.json as some of its constructor arguments. The Runner is agnostic to the specific Framework Steps, and the specific test_config.json. This means that a full library of test_config.json’s, along with accompanying test data and validation tests, can be created and executed with a Runner in series. One trivial way to do this would be to use a Python unit testing framework (unittest, Pytest, etc) to create individual tests that each represent the invocation of the runner on a specific test_config.json. The tests will likely need to be executed in series due to host resource constraints if only a single host is available, but if multiple hosts are available they can be executed in parallel using built-in mechanisms for running specific tests.
### UC5 - Use Case - I want the framework to auto-generate test config based on my real cluster
In this scenario, the user has an existing “real” cluster that they want to convert into a test_config.json, likely alongside representative test data and other configuration. The user would then be able to perform a test upgrade against a high-fidelity simulacra of their cluster from the comfort of a developer laptop/desktop.
While this use-case needs further design consideration, at a high level it seems possible to write a tool that can be pointed at a “real” cluster and extract the relevant details using the REST API (data types, plugin configuration, cluster configuration, etc). This can then be converted into a test configuration that feeds into the Upgrade Testing Framework like any other.
As with creating useful test and data sets for the other use-cases, community involvement will be crucial for determining what specific items to look for and capture when auto-generating test config.
**NOTE -** The mechanism for scanning a cluster and classifying its configuration appears shared w/ the ability to make predictive assessments
### UC6 - Use Case - I want a predictive assessment of the issues I’ll encounter when upgrading my real cluster
In this scenario, the user wants an assessment of what would happen if they were to perform an upgrade on an extant, “real” cluster to some proposed version of ElasticSearch/OpenSearch - WITHOUT having to actually execute a real or test upgrade.
This use-case also needs further design consideration, but it seems possible to write a tool that connects to a “real” cluster, uses the REST API to classify its relevant details, then uses the knowledge base of “expectations” captured in the validation tests to make a prediction of the types of issues likely to be encountered. At that point, a report could be generated and supplied to the user.
The complexity here appears to lie in how to model the “expectations” used by those validation tests in a way that they can be consumed by the assessment tool.
**NOTE -** The mechanism for scanning a cluster and classifying its configuration appears shared w/ auto-generating test config