OpenSearch Community Meeting - 2022-1206

kris · November 22, 2022, 10:49pm

OpenSearch Community Meeting - 2022-1206 - Hope we see you all there!

Agenda:

Migration! Brian Presley, SDM, OpenSearch share details about our new migration team and its charter, as well as lead a discussion about
- The differences between the migration and upgrade paths
- Development of an testing framework used for upgrade assessment and validation
- How you can contribute to the discussion via the new migration repo and links to open issues

Event page
Meetup page

Meeting Link

Meeting ID: 829 3954 0402
Passcode: 707765

Would you like to present? Tag @kris @dtaivpp @nateynate and we’ll work to get you added to the agenda!

Feel free to comment on this agenda before the meeting if you want to add an item or have a question.

After the meeting, we will post the chat log and any meeting notes. We welcome you to keep the conversation going here on the forum.

========
By joining the OpenSearch Community Meeting, you grant OpenSearch, and our affiliates the right to record, film, photograph, and capture your voice and image during the OpenSearch Community Meeting (the “Recordings”). You grant to us an irrevocable, nonexclusive, perpetual, worldwide, royalty-free right and license to use, reproduce, modify, distribute, and translate, for any purpose, all or any part of the Recordings and Your Materials. For example, we may distribute Recordings or snippets of Recordings via our social media outlets.

kris · December 7, 2022, 12:02am

Excellent meeting today everyone! Thank you all for your participation and conversation.

Chat log:

15:00:20 From Kris Freedain to Everyone:
We’ll get this going in just a few minutes
15:05:08 From Kris Freedain to Everyone:
Welcome OpenSearch Community!!
15:05:59 From Kris Freedain to Everyone:
Every quarter OpenSearch obtains feedback to understand what improvements the community would like OpenSearch to make. Please take a few minutes to take the survey linked on the front page of opensearch.org

https://amazonmr.au1.qualtrics.com/jfe/form/SV_1BxJNrtCo4LbweW?Source=Community

15:07:45 From Aparna Sundar to Everyone:
Two months
15:10:56 From James Hunter to Everyone:
y
15:10:57 From Kris Freedain to Everyone:
Y / N ?
15:10:58 From Madhav Vij to Everyone:
y
15:11:02 From Cole to Everyone:
y
15:11:02 From Arran Dengate to Everyone:
Y
15:11:03 From Sam Herman to Everyone:
y
15:11:04 From Rob Marshall to Everyone:
Y
15:11:04 From Ryan Paras to Everyone:
Y - mostly does via AWS managed though
15:11:12 From Yiwei Wu to Everyone:
y
15:11:16 From Tomas B to Everyone:
n
15:11:27 From vnsatyam to Everyone:
n
15:11:32 From GSmith to Everyone:
Y (1.x → 1.x; ODFE-OpenSearch)
15:11:32 From Paul Aubrey to Everyone:
y
15:14:30 From Kris Freedain to Everyone:
GitHub - opensearch-project/opensearch-migrations: All things migrations and upgrades for OpenSearch
15:14:38 From Nate B. to Everyone:
Speed record Kris. Thanks.
15:14:53 From Andriy Redko to Everyone:
Impressive Kris!
15:15:08 From Kris Freedain to Everyone:
We look forward to the community’s feedback on this!
15:20:06 From Nate B. to Everyone:
Ooh a red badge!
15:21:08 From Kris Freedain to Everyone:
To provide feedback on the upgrade testing framework proposal: [Proposal] Upgrade Testing Framework UX · Issue #29 · opensearch-project/opensearch-migrations · GitHub
15:26:20 From Nate B. to Everyone:

15:26:46 From Nate B. to Everyone:
Welcome to OpenSearch!
15:27:23 From Kris Freedain to Everyone:
To provide feedback on the ‘evil data set’:
15:27:25 From Kris Freedain to Everyone:
[Proposal] Develop a tool to generate "evil"/edge case datasets for OpenSearch · Issue #9 · opensearch-project/opensearch-migrations · GitHub
15:28:29 From Kris Freedain to Everyone:
And, the PR of the prototype: Evil Dataset Proof of Concept (updated proposal) by mikaylathompson · Pull Request #31 · opensearch-project/opensearch-migrations · GitHub
15:29:25 From Kris Freedain to Everyone:
Your input on this would be greatly useful and appreciated!
15:30:07 From Chris to Everyone:
+1, would love feedback!
15:30:12 From Kris Freedain to Everyone:
Any questions for Chris or Mikayla?
15:30:46 From Kris Freedain to Everyone:
https://forum.opensearch.org/ - tag your posts with ‘upgrade’ or ‘migration’
15:33:02 From Kris Freedain to Everyone:
These tags should be available across all categories on the forum.
15:33:45 From Sam Herman to Everyone:
Are there plans to support in place migration of Lucene versions?
15:35:33 From Nate B. to Everyone:
Can we have your forum and GitHub ID’s in case someone wants to tag you?
15:35:34 From Kris Freedain to Everyone:
opensearchresearch@amazon.com
15:36:39 From Nate B. to Everyone:
Remote reindexing is indeed a pain.
15:37:27 From James Hunter to Everyone:
Pain point - Migrating from ES 7.12+ to OpenSearch. Due to index incompatibility, data must be re-indexed.
15:38:20 From Nate B. to Everyone:
^ This is a situation that’s near and dear to my heart.
15:38:24 From David Tippett to Everyone:
@Sam do you have an issue open for the in place migration?
15:38:33 From Nate B. to Everyone:
7.10 can migrate to 1.0, but once you go past 1.0 a reindex will probably be required.
15:38:34 From Sam Herman to Everyone:
No I don’t but can open one
15:38:51 From Sam Herman to Everyone:
For the in place migration I can open an issue if there isn’t one
15:39:46 From Kevin Garcia to Everyone:
You might need some ETL workflows for data migrations (fluent bit?, data prepare? Other?)
15:41:15 From Kartik to Everyone:
@Sam @David I’ve seen a similar request come in - One click upgrade index version without using reindex · Issue #5443 · opensearch-project/OpenSearch · GitHub
15:41:49 From Kartik to Everyone:
If that is in line with your use-case, please add your voice to that discussion
15:41:54 From David Tippett to Everyone:
@Kartik I feel like this one is similar as well. [Proposal] Support reading/writing indices created with older versions of Elastic or OpenSearch · Issue #5437 · opensearch-project/OpenSearch · GitHub
15:42:13 From David Tippett to Everyone:
Either would be a good place to add commentary.
15:42:18 From Kartik to Everyone:
Absolutely!
15:42:38 From Nate B. to Everyone:
I’d love to see an assessment tool also include various utilities for helping people through the various ‘migrations’ that peoples’ data will have to take.
15:42:55 From Nate B. to Everyone:
Like a Ruby on Rails schema migration.
15:44:13 From Chris to Everyone:
We’re thinking of looking into something like that later on once we get the test framework up and running, but AFAIK we don’t have a specific timeline or scope in mind
15:45:54 From Nate B. to Everyone:
I wanted to thank our presenters and our loyal opensearch community for joining today. I’ve got 10 mins before preschool charges me for babysitting fees. Thanks everyone! Nate out.
15:47:30 From Sam Herman to Everyone:
@kartik that seems similar issue yes
15:55:12 From Kris Freedain to Everyone:
Agree - great points Ryan & Greg
15:56:23 From James Hunter to Everyone:
Thank you
15:56:27 From Kris Freedain to Everyone:
Thank you everyone for attending - and thank you for all your input on this!
15:56:31 From Andriy Redko to Everyone:
Thank you!
15:57:09 From Maria Hatfield to Everyone:
Thank you. Great meeting.
15:57:15 From Ryan Paras to Everyone:
thank you!!
15:57:20 From Muhammad Ali to Everyone:
thank you
15:57:22 From Patti Juric to Everyone:
Thank you!
15:57:23 From Joel Martens to Everyone:
Thanks!

kris · December 7, 2022, 12:54am

Today’s slides:

github.com/opensearch-project/opensearch-migrations

[Proposal] Upgrade Testing Framework UX

opened 03:32PM - 01 Dec 22 UTC

chelma

enhancement

**NOTE:** Development of the proposed Framework will be pursued in another repo;… posting the proposal in this one for visibility ## Objective The purpose of this doc is to outline key user experiences for the Upgrade Testing Framework so the design is driven from a customer-first perspective. The designs represented in the doc are speculative and open to change; think of them as proposals rather than assertions. ## Terminology * **Application Developer:** individual that uses OpenSearch as a search/logs engine in their system, but doesn't necessarily manage it * **Cluster Admin:** individual responsible for the maintenance and/or support of one or more ElasticSearch/OpenSearch Clusters * **Committed OpenSearch Developer:** individual who contributes code to the OpenSearch Project, either its core repositories or the constellation of repositories connected to them * **Casual OpenSearch Developer:** individual who might be willing to contribute code to the OpenSearch Project, assuming that the process is easy * **User:** Either a Cluster Admin or a Committed/Casual OpenSearch Developer; an Application Developer might wear any of those hats as well * **Upgrade:** moving from one version of ElasticSearch/OpenSearch to another without necessarily changing the underlying hardware hosting the cluster * **Migration:** moving an ElasticSearch/OpenSearch cluster from one underlying hosting solution to another without necessarily changing the version of the cluster ## Background The Upgrade Testing Framework is a proposed project (https://github.com/opensearch-project/opensearch-migrations/issues/30) for the OpenSearch Project. It has the following goals: * Provide OpenSearch Developers a way to easily test the outcomes of more complex, real-world upgrades than the existing backwards compatibility (BWC) tests allow and ideally replace those existing tests. It intends to address upgrades to the core engine and plugins, along with accompanying data, configuration and metadata, and handle N+M version upgrades instead of just N+1. * Perform upgrades during testing in the same manner as Users execute them “in the wild”. * Provide Cluster Admins mechanisms to predict the outcome of an upgrade before executing one and enable them to make a representative simulacra of their cluster on their laptop or other development machine in order to test out an upgrade in a safe environment. * Promote and encapsulate validation mechanisms that Cluster Admins can use to determine whether a upgraded candidate cluster is ready for production traffic. * Serve as a central repo to iteratively capture the community’s full understanding of differences between versions and continuously assert that each difference remains as expected. This will enable the community to formally understand, document, and ideally resolve these differences. ## Setup The Upgrade Testing Framework will just be a Python project in GitHub. To set it up on a laptop, developer desktop, or CI/CD fleet, just clone it to the host and execute the command-line utility in the downloaded source. The Framework will use system-supplied copies of Python and Docker to perform its work. ## Use Cases ### Core Use Cases These use-cases are core features that are necessary for any “initial release”. * UC1 - I want to test an upgrade using default data/tests * UC2 - I want to simulate an upgrade using custom data/tests * UC3 - I want to know if an upgraded cluster is ready for production traffic ### Incremental Use Cases These use-cases add value, but will be prioritized after the Core Use Cases and based on user feedback/business requirements. * UC4 - I want to run a bunch of different tests against a bunch of different clusters * UC5 - I want the framework to auto-generate test config based on my real cluster * UC6 - I want a predictive assessment of the issues I’ll encounter when upgrading my real cluster ### UC1 - Use Case - I want to test an upgrade using default data/tests #### Step 1: Define task configuration Before executing a upgrade test run, the User will need to provide some configuration so the framework knows what to do. It is proposed that for the initial release, we do that via a configuration file. The Framework will ship with a library of `test_config.json` files and their associated Dockerfiles, data files, and test scripts that encompass the current understanding of the differences between versions ``` # test_config.json { "cluster_def": { "start": { "dockerfile": "./path/to/Dockerfile1", "tags": [ "ElasticSearch_7_10_2" ] }, "end": { "dockerfile": "./path/to/Dockerfile2", "tags": [ "OpenSearch_2_3" ] }, "desired_nodes": 2 }, "upgrade_def": { "style": "snapshot-restore" }, "test_def": { "default_tests": { "dataset_file": "./data/los_datos_del_diablo.json", "test_files_start": [ "./tests/default_tests_start.py", ], "test_files_end": [ "./tests/default_tests_end.py", ] }, "default_cluster_comparison_tests": { "test_files_end": [ "./tests/default_cluster_comparison_tests.py", ] } } } ``` A few notes: * Users can easily supply whatever Dockerfiles they want to represent the nodes in their cluster * We will likely want the cluster setup, such as number and type of nodes, to be independently configurable between the starting and ending clusters, but for the initial cut we’ll avoid that complexity * The “tags” are probably unnecessary, but they are intended to highlight the fact we’ll want to down-select which expectations/data/tests to apply to a given cluster based on its configuration later on. We’ll likely solve this problem by directly interrogating the clusters after they are stood up. * The “test_def” section is where the framework will learn about the data to load into the cluster and the tests to execute, and when to perform those actions. It seems like data must necessarily be paired with tests (what’s the point of adding data if you don’t have tests that use it?); tests do not necessarily require new data. #### Step 2: Invoke the framework The User then invokes the framework CLI to kick off the upgrade. The CLI gives the user feedback throughout the upgrade process: ``` > ./utf test --file test_config.json [bootstrap] Building Dockerfile "./path/to/Dockerfile1"... [bootstrap] Building Dockerfile "./path/to/Dockerfile2"... [bootstrap] Checking for conflicting Docker resources from previous runs... [bootstrap] Found Docker resources from previous run: 6 containers, 1 network [bootstrap] Do you wish to remove the conflicting resources and continue? (y/n) > y [bootstrap] Removing conflicting resources... [initialize_start] Spinning up (2) nodes for the starting cluster... [initialize_start] Starting Node "start.1" [initialize_start] Starting Node "start.2" [initialize_start] Cluster stabilized, Node "start.2" elected master [test_start] Beginning upload of test data... [test_start] Upload of test index "los_datos_del_diablo" started [test_start] Upload of test index "los_datos_del_diablo" completed [test_start] Uploaded all test data [test_start] Beginning analysis of cluster's starting state... [test_start] Running tests "./tests/default_tests_start.py"... [test_start] Cluster met all expectations for starting state [execute_upgrade] Beginning upgrade of cluster... [execute_upgrade] Upgrade style: Snapshot-Restore [execute_upgrade] Performing snapshot on source cluster... [execute_upgrade] Snapshot "test_snapshot_2022_11_21" stored at /home/blzbub/snapshots [execute_upgrade] Starting Node "end.1" [execute_upgrade] Starting Node "end.2" [execute_upgrade] Target cluster stabilized, Node "end.1" elected master [execute_upgrade] Restoring snapshot into target cluster... [execute_upgrade] Snapshot restored [execute_upgrade] Finished upgrade of cluster [test_end] Beginning analysis of target cluster's ending state... [test_end] Running tests "./tests/default_tests_end.py"... [test_end] Running tests "./tests/default_cluster_comparison_tests.py"... [test_end] Target cluster did NOT meet all expectations for ending state; see final report for details ==== FINAL RESULTS ==== Source cluster did meet all starting expections Target cluster did NOT meet all ending expections, see final report ==== LOGS AND REPORTS ==== Final report: ./reports/report.2022_11_11_08_10_31.txt Full Logs: ./logs/logs.2022_11_11_08_10_31.txt State File: ./state_file ==== CLUSTER DETAILS ==== Source cluster is STILL RUNNING (127.0.0.1:9200) Target cluster is STILL RUNNING (127.0.0.1:9201) ``` Some notes: * The Cluster is left running at the end of the test run by default so that the user can poke at it if they want to, so this behavior should be controlled via command-line flags and the default can easily be flipped. * The CLI should present only the most important, highest-level details to the user via stdout unless something goes wrong. The full logs of every under-the-hood operation are dumped to a log file the user can explore if interested or they have a need. * The overall flow of the testing is broken up into stages marked with prefixed brackets (`[test_start]`). These should ideally be idempotent break-points where the application can be stopped/started. This is made possible by saving the application state to a state-file so that it can pick up where it stopped. This enables the User to do things like stop the application immediately before the upgrade begins and poke around the running containers for the starting cluster for debugging purposes. It’s pretty easy to set up and maintain this, and it adds substantial value in comparison to the additional complexity. * It is expected that the “Test” steps (`[test_start]`, `[test_end]`) will be to invoke separate Python executables to run unit tests against the starting/ending cluster(s). Separating the test/analysis layer from the CLI orchestration layer makes it possible to do things like have users run the final validation steps against “real” clusters instead of just ones on their laptop, etc. * The report will be something that outlines specifically all the tests performed and their results. Ideally, we’d present it in a user-friendly way like an HTML form, but that’s probably an optimization for later as long as we have something functional initially. However, the initial cut will need to be consumable by non-Developers, so we’ll want to do something more sophisticated than just dump raw unit test results. #### Step 3: Review the Results For the initial cut of the framework, the results will likely just be the output of a standard code testing framework (e.g. pytest). If this doesn’t have the clarity we desire, then we can assemble our own report w/ custom code. In terms of what types of things the tests are expected to cover and the report is expected to surface, consider the following scenarios: * Things we expect to work/be true (e.g. same number of documents before/after the upgrade) * Updates in OpenSearch core or plugin code resulting in translations between the source and target versions (configs, data, metadata “morph” during the upgrade) * Changes in compatibility of the system under test - features removed, changed, bugs introduced or fixed, etc * Bugs in the validation framework’s tests themselves ### UC2 - Use Case - I want to test an upgrade using custom data/tests #### Step 1: Update task configuration Users can supply custom data (e.g. data that is unique to their setup and a part of the public repo) using the configuration file. Custom data necessarily implies custom tests, so the two will need to be configured together. As long as the custom data and tests conform to the same interface as the “default” ones, the framework will ingest and execute them without distinction between what’s “custom” or not. The default data/tests serve as an example to base custom versions on, in addition to the documentation we write. ``` # test_config_custom.json { "cluster_def": { "start": { "dockerfile": "./path/to/Dockerfile1", "tags": [ "ElasticSearch_7_10_2" ] }, "end": { "dockerfile": "./path/to/Dockerfile2", "tags": [ "OpenSearch_2_3" ] }, "desired_nodes": 2 }, "upgrade_def": { "style": "snapshot-restore" }, "test_def": { "default_tests": { "dataset_file": "./data/los_datos_del_diablo.json", "test_files_start": [ "./tests/default_tests_start.py", ], "test_files_end": [ "./tests/default_tests_end.py", ] }, "default_cluster_comparison_tests": { "test_files_end": [ "./tests/default_cluster_comparison_tests.py", ] }, "custom_tests": { "dataset_file": "./my_custom_data.json", "test_files_start": [ "./my_custom_tests_1.py", ], "test_files_end": [ "./my_custom_tests_2.py", ] } } } ``` * An existing tool, Elasticdump ([see here](https://github.com/elasticsearch-dump/elasticsearch-dump)), provides a way to export indices from a cluster. It seems reasonable to either base our data format around its format or at the very least accept it. #### Step 2: Invoke the framework As normal, the User would just invoke the Framework with their configuration file and the Framework will do the rest: ``` > ./usf test --file test_config_custom.json [bootstrap] Building Dockerfile "./path/to/Dockerfile1"... . . # Intermediate steps skipped for clarity . [test_start] Beginning upload of test data... [test_start] Upload of test index "los_datos_del_diablo" started [test_start] Upload of test index "los_datos_del_diablo" completed [test_start] Upload of test index "my_custom_data" [test_start] Upload of test index "my_custom_data" completed [test_start] Uploaded all test data [test_start] Beginning analysis of cluster's starting state... [test_start] Running tests "./tests/default_tests_start.py"... [test_start] Running tests "./my_custom_tests_1.py" [test_start] Cluster met all expectations for starting state [execute_upgrade] Beginning upgrade of cluster... . . # Intermediate steps skipped for clarity . [test_end] Beginning analysis of cluster's ending state... [test_end] Running tests "./test/default_tests_end.py" [test_end] Running tests "./test/default_cluster_comparison_tests.py" [test_end] Running tests "./my_custom_tests_2.py" [test_end] Cluster did NOT meet all expectations for ending state; see final report for details . . # And so on . ``` ### UC3 - Use Case - I want to know if an upgraded cluster is ready for production traffic While the Upgrade Testing Framework, as a whole, is focused on providing Users a way to test the full upgrade process on their laptop, developer desktop, etc against a Dockerized test cluster, the validation tests themselves should ideally not care whether they are being pointed at a “test” cluster or a “real” cluster, and be equally applicable to either. For that reason, we will structure the validation tests as separate executable scripts that the Framework will invoke. This enables the same validation tests to be used by a Cluster Admin to help determine if an upgraded cluster is ready for production traffic. Presumably, a Cluster Admin would not need to run the validation tests against an upgraded cluster that is currently serving production traffic, as they could use their existing alarms/metrics for that. This is an assumption though and open to re-evaluation. #### Step 1: Duplicate production cluster The Cluster Admin would create a duplicate of their production cluster, perhaps using cross-cluster replication. #### Step 2: Perform manual upgrade of duplicate The Cluster Admin would use the OpenSearch documentation to perform an upgrade of the duplicate cluster, perhaps using snapshot/restore. #### Step 3: Execute the validation tests against the clusters Behind the scenes of the [test_end] step, all the Framework is doing is making shell invocations of executable Python scripts that contain tests to be performed against a pair of (ip address, port) tuples. As long as there’s a network path to the source/target clusters, the tests should run the same. ``` > ./test/default_cluster_comparison_tests.py --source_cluster 10.0.0.2:9200 --target_cluster 10.0.1.2:9200 --auth root:i_am_g_root Running tests against source (10.0.0.2:9200) and target (10.0.1.2:9200) TEST 1: Same number of documents in index... TEST 1: Passed . . . ``` * This is obviously easiest for groups of tests that can directly compare the contents of two clusters without requiring a setup step of first uploading specific data that the tests are intrinsically tied to. As a result, there is a need to carefully separate groups of tests based on their setup requirements (e.g. avoid adverse “mingling”). * However, in principal there’s no reason why user couldn’t upload test data to their duplicate (pre-upgrade) cluster, perform the upgrade on the duplicate, and then run the validation tests tied with the test data... but would they really want to? Currently unclear. ### UC4 - Use Case - I want to run a bunch of different tests against a bunch of different clusters The above discussion has so far assumed that all the tests we’d want to run could be performed using a single cluster/upgrade process. However, it’s likely we’ll have sets of tests that we want to run that are mutually incompatible, or at the very least inconvenient to combine. The proposed Framework should be able to handle that simply, but to understand how we need to go into implementation details. As has been presented thus far, the Framework is built around the idea of a top level run-loop (a Runner) which executes a sequence of steps (Framework Steps) in order. The Runner will be a self-contained object which handles its own setup/teardown, and accepts a list of Framework Steps to perform as well as a test_config.json as some of its constructor arguments. The Runner is agnostic to the specific Framework Steps, and the specific test_config.json. This means that a full library of test_config.json’s, along with accompanying test data and validation tests, can be created and executed with a Runner in series. One trivial way to do this would be to use a Python unit testing framework (unittest, Pytest, etc) to create individual tests that each represent the invocation of the runner on a specific test_config.json. The tests will likely need to be executed in series due to host resource constraints if only a single host is available, but if multiple hosts are available they can be executed in parallel using built-in mechanisms for running specific tests. ### UC5 - Use Case - I want the framework to auto-generate test config based on my real cluster In this scenario, the user has an existing “real” cluster that they want to convert into a test_config.json, likely alongside representative test data and other configuration. The user would then be able to perform a test upgrade against a high-fidelity simulacra of their cluster from the comfort of a developer laptop/desktop. While this use-case needs further design consideration, at a high level it seems possible to write a tool that can be pointed at a “real” cluster and extract the relevant details using the REST API (data types, plugin configuration, cluster configuration, etc). This can then be converted into a test configuration that feeds into the Upgrade Testing Framework like any other. As with creating useful test and data sets for the other use-cases, community involvement will be crucial for determining what specific items to look for and capture when auto-generating test config. **NOTE -** The mechanism for scanning a cluster and classifying its configuration appears shared w/ the ability to make predictive assessments ### UC6 - Use Case - I want a predictive assessment of the issues I’ll encounter when upgrading my real cluster In this scenario, the user wants an assessment of what would happen if they were to perform an upgrade on an extant, “real” cluster to some proposed version of ElasticSearch/OpenSearch - WITHOUT having to actually execute a real or test upgrade. This use-case also needs further design consideration, but it seems possible to write a tool that connects to a “real” cluster, uses the REST API to classify its relevant details, then uses the knowledge base of “expectations” captured in the validation tests to make a prediction of the types of issues likely to be encountered. At that point, a report could be generated and supplied to the user. The complexity here appears to lie in how to model the “expectations” used by those validation tests in a way that they can be consumed by the assessment tool. **NOTE -** The mechanism for scanning a cluster and classifying its configuration appears shared w/ auto-generating test config

github.com/opensearch-project/opensearch-migrations

[Proposal] Develop a tool to generate "evil"/edge case datasets for OpenSearch

opened 12:00AM - 10 Nov 22 UTC

mikaylathompson

enhancement

### This proposal has been significantly modified as of 11/5. The original propo…sal can be expanded at the bottom of this post. Note that comments before 11/5 refer to the original proposal. ---- **Is your feature request related to a problem? Please describe.** The behavior of ElasticSearch/OpenSearch changes between versions in ways both intentional (new features or datatypes, deprecated features) and unintentional (bugs). These changes impact upgrades and the decisions users make around them. It would be useful to have a dataset that intentionally probed edge cases and behavior changes and could be used to verify the behavior of various versions and in various situations. As an example of how we would use this dataset: we could upload it to a cluster, run a series of queries, migrate/upgrade the cluster, and re-run the queries to ensure that the behavior has stayed the same (or changed only in expected ways). This would be a living dataset—there will be many cases relevant to future versions of OpenSearch that we're not aware of today. **Describe alternatives you've considered** The backwards compatibility tests (especially the Full Cluster Restart tests) use some randomized testing data ([source](https://github.com/opensearch-project/OpenSearch/blob/main/qa/full-cluster-restart/src/test/java/org/opensearch/upgrades/FullClusterRestartIT.java?rgh-link-date=2022-11-10T00%3A00%3A29Z#L112#L172)), including a few that seem targeted towards specific edge cases (e.g. the "field.with.dots" field name), but they're quite small and limited. Additionally, they largely don't focus on edge cases and there's no concept of behavior changing across versions. The other related material I've found is the [OpenSearch Benchmark Workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads). Per my understanding, this a collection of datasets with accompanying operations—index, update, aggregations to run, etc. The datasets seem to cover a broad list of realistic usecase scenarios, and are therefore interesting, but tailored to a different purpose. None of them seem to intentionally target the edge cases of interest in this case. A previous version of this proposal suggested that datasets should be randomly generated with the option to test scale or performance related limits. In this version, the initial suggestion has been scaled down to focus specifically on edge cases that are specifically reproducible. **Describe the solution you'd like** I'd like to create a library of data "points" that can be used independently or together to Each datapoint (not necessarily a single document) would be a directory that contains one or more of: - a bulk-upload formatted JSON file that contains the relevant data and is used to index it - a query (TODO: details on formatting here?) that exercises the behavior in question - an expected result for the query (the answer key). There could be multiple expected results for different versions or other situations. - a README.md that explains in human-friendly terms what's being tested and under what conditions the expected result is actually expected. Datapoints would generally create and use their own index to prevent interference between various tests. This also allows it to be hard-coded into the bulk json file and the query. **Phase 0**: Data points are created with data & queries to illustrate known bugs, features, and API changes. They are run manually by a user (documentation provided) for each use case the user is interested in and the actual query result can be compared to the expected result. **Phase 1:** A "runner" script is added that can take a list of test cases, run them all, and show which did not give the expected result. **Phase 2:** Test cases can be tagged with specific versions or areas of interest (e.g. test cases for a specific plugin) and the runner script can select all datapoints meeting a specific use case. After phase 1, this has a large amount of potential overlap with the future of #24 and the validation framework, so I haven't attempted to extrapolate too far down the path of what comes next. <details><summary>Original Proposal</summary> **Is your feature request related to a problem? Please describe.** I (and my team) would like to make use of a consistent dataset for testing on OpenSearch that emphasizes edge cases--in our case, this would be very helpful for testing migrations and upgrades. While there are plenty of sample datasets out there (some mentioned below), our hope for this one is that it's a fairly comprehensive dataset that can capture intentional or unintentional differences in behavior in various settings, such as different versions. A few categories we're aware of wanting to test: all currently existing data types, cases where dynamic field mapping behavior has changed, cases where new data formats were added, cases that approach the size limits for each field type, anywhere bugs have been fixed in various versions for ingestion or storage of specific field types. We're expecting to find more as we go and would love suggestions. This would be a living dataset—there will be cases relevant to future versions of OpenSearch that we're not aware of today. As an example of how we would use this dataset: we could upload it to a cluster, run a series of queries, migrate/upgrade the cluster, and re-run the queries to ensure that the behavior has stayed the same (or changed only in expected ways). **Describe alternatives you've considered** The backwards compatibility tests (especially the Full Cluster Restart tests) use some randomized testing data ([source](https://github.com/opensearch-project/OpenSearch/blob/main/qa/full-cluster-restart/src/test/java/org/opensearch/upgrades/FullClusterRestartIT.java#L112#L172)), including a few that seem targeted towards specific edge cases (e.g. the "field.with.dots" field name). The other related material I've found is the [OpenSearch Benchmark Workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads). Per my understanding, this a collection of datasets with accompanying operations—index, update, aggregations to run, etc. The datasets seem to cover a broad list of realistic usecase scenarios, and are therefore interesting, but tailored to a different purpose. None of them seem to intentionally target the edge cases of interest in this case. I haven't come across other similar datasets, but would love to be pointed in their direction if they exist. **Describe the solution you'd like** Requirements for the dataset: 1. The dataset covers a large—approaching comprehensive—set of data types and test cases. 2. The datset is available in multiple sizes—in many cases a small number of documents (~1k?) is enough to use as testing data, but there are particularly migration/upgrade use-cases where we'll be curious about the performance with large sizes (hundreds of gigabytes into terabytes). 3. There is a set of queries and expected responses matching the dataset. Some of these will be along the lines of "how many times does 'elephant' occur in a given field" and others will be more like "what is the inferred field mapping for field X". 4. (optional) The "width" of the dataset (as opposed to the number of documents) can be reduced — a user can request a dataset that only has "basic" fields, without testing all edge cases. Or, similarly, a user can request only cases that test dynamic field mapping behavior or those that are relevant to a specific plugin. 5. (optional) The data is available in multiple export formats—csv, a json-like document ready for bulk upload, or piped directly to a cluster. 6. It is fairly simple to add a new field to the generator. Given the requirements outlined above, it seems more feasible to create a script to randomly generate appropriate data on demand than a fixed dataset. With this approach in place, there's a 7th requirement: 7. Following the pattern of the OpenSearch tests, a predictable dataset can be generated by providing a seed value. If not provided, the data will be random and the seed will be returned to the user. The user can provide (likely via a CLI) their requirements. For an MVP, this is probably just the number of documents (or total size of data) and an optional seed. Future iterations could accept the set of fields to include (requirement 4 above) and the output format (requirement 5). Setting aside input and export related functionality, the core of this script would be very similar to libraries like [faker.js](https://github.com/faker-js/faker)/[python faker](https://github.com/joke2k/faker) that generate realistic fake data, and looking into their architecture may be helpful. For some specialized fields, it's possible that leveraging one of these libraries could be useful. In the code, there needs to be a mapping between fields and functions to generate appropriate data. Many of these will be very basic—random alphanumeric string, random int, etc. with some more complicated ones (e.g. ip ranges or data that satisfies a specific edge case). Adding a new field to the dataset will require creating the generator function and adding it to the mapping with the field name. For each field that's added, there also may (or may not) be 1/ one or more queries associated with the field (and their expected values), and 2/ an index field mapping entry. It's possible that for some types of queries ("how many times does 'elephant' occur"), the randomized data is a poor match. As we encounter these cases, I think having a second, static dataset would be helpful. Adopting the benchmark workloads might be a good fit for this use case. **Specific Questions** 1. Does this (or something substantially similar) already exist? 2. Would you, as a potential user of this script/dataset, have additional requirements for it? 3. Pointers/feedback/thoughts on proposed solution? <details>

github.com/opensearch-project/opensearch-migrations

Evil Dataset Proof of Concept (updated proposal)

opensearch-project:main ← opensearch-project:evil-dataset-round-2-poc

opened 12:03AM - 06 Dec 22 UTC

mikaylathompson

+148 -0

### Description This PR is a proof of concept for the updated proposal for the …evil dataset (#9). It adds a single, fairly trivial example, along with the documentation/code samples to run it. ### Testing Testing has been manual so far, and this is just POC for the time being, so I think that's sufficient until we've settled on a path forward. ### Check List - [ ] New functionality includes testing - [ ] All tests pass, including unit test, integration test and doctest - [X] New functionality has been documented - [X] Commits are signed per the DCO using --signoff By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

https://forum.opensearch.org/

opensearchresearch@amazon.com

Topic		Replies	Views
OpenSearch Community Meeting - 2023-0117 Community community-meeting	7	920	January 17, 2023
OpenSearch Community Meeting - 2022-1025 Community community-meeting	1	899	October 25, 2022
OpenSearch Community Meeting - 2022-1122 Community community-meeting	6	715	November 24, 2022
OpenSearch Community Meeting - 2023-1017 Community community-meeting	3	646	October 28, 2023
OpenSearch Community Meeting - 2022-0802 Community community-meeting	6	1160	August 5, 2022

OpenSearch Community Meeting - 2022-1206

Related topics