On January 21st, 2021, we started working on getting OpenSearch ready for public release. Some of us might already be familiar to you as contributors to Open Distro for Elasticsearch. Others are new to the project. Over time, we hope you’ll get to know all of us and we’ll get to know you.
As we’ve mentioned before, there was some setup work that needed to be done before we could make the repos public. It’s been an intense and exciting time. We’ve created and reviewed over 650 Pull Requests (PRs), made changes to 56,355 files, and deleted over 4,796,305 lines of code. The focus of all this work has been to create a code base that can serve as a solid open source foundation for us to build on together. Now that the day is finally here (did we mention it’s been intense?), we wanted to tell you more details about those changes.
Simple Things First
We initiated the fork from the 7.10 branches of Elasticsearch and Kibana. While all other branches and tags were ignored, we made sure to migrate the full git history of the 7.10 branch in order to retain proper attribution for the code’s original authors. This also let us make all future changes or modifications with full transparency. Once we’d created the initial fork, we set out to remove any code that wasn’t compatible with Apache License, 2.0 (ALv2).
The first step to preparing the code base for community release was to remove X-Pack code: git rm -rf ./x-pack
. For auditability and transparency, we opened and reviewed a PR for each removal or change
While this single command removed all of the non-ALv2-compatible code, the codebase still contained numerous references to (now removed) x-pack code along with REST client functionality for features that aren’t available in the open source distribution. Similarly, Kibana’s source code and build infrastructure had tightly integrated references and workflow integrations with x-pack versus the open source builds. Simply removing the x-pack
directory left the build of both code bases broken.
Removing X-Pack References
Even though the code for X-Pack features lived in a single directory, the X-Pack interfaces weren’t similarly compartmentalized. We could have retained these interfaces, but with the features removed the endpoints would simply fail. Since this would be confusing for users as well as require us to maintain unused code, we removed the following high-level-rest client interfaces from the open source code base (again, with each removal performed in a separate PR):
analytics
asyncsearch
ccr
common
enrich
eql
graph
indexlifecycle
license
migration
ml
rollup
security
transform
watcher
Similarly for the Kibana fork, each hard-coded X-Pack reference needed to be removed, and the broken references fixed. Once this was complete we began the effort to achieve a clean build of the code base by tackling any failing unit and integration tests.
One Elasticsearch “Flavor”, Multiple Distributions
The Elasticsearch build framework has a concept of different build “flavors” to support the build, packaging, and distribution of X-Pack features at various Elastic license and subscription tiers. This logic controls the “what” and “how” different licensed source code gets packaged on different platform distributions supported by the stack. OpenSearch has no concept of license tiers, so “build flavor” logic is unnecessary, and we removed it. We did retain the packaging and distribution logic for the various supported platforms. This ensures a consistent compatibility matrix without the need for controlling which source components are included, or excluded, in the distribution.
“Phone Home” Telemetry and Marketing Content
If a deployment of the Elasticsearch and Kibana stack (“ELK”) is connected to the internet, it collects and sends information about how you use it (known as telemetry) to Elastic. Telemetry collection is integrated within the core open source Kibana and Elasticsearch services, enabling it to gather different levels of user and cluster information (e.g., API calls, field types, aggregations). Once collected, that data is periodically uploaded to a telemetry server at Elastic.co. In Open Distro for Elasticsearch, we use configuration settings to turn telemetry collection off by default, and for OpenSearch we disabled the collection and posting of telemetry in the code itself. We left the now-inactive source code that performs telemetry intact so we as a community can collectively decide what to do with it going forward.
In addition to telemetry, the open source Kibana distribution contained an RSS newsfeed of product alerts and marketing updates. We turned off the newsfeed feature, and like with the telemetry implementation, we left the source code in place so we can decide about next steps together.
CI, “gradle check”, and End to End validation
The CI, test fixtures, backwards compatibility framework, and packaging implementation also relied on sending support artifacts and configuration settings to Elastic-hosted servers. To ensure the open source repository had a reliable CI infrastructure, we set up a new end to end automated testing infrastructure. The new CI infrastructure checks incoming pull requests and runs nightly builds and testing to catch and discover early regressions in the code base. The initial number of checks is quite small, but we plan to maintain, build on, and collaboratively improve this initial configuration over time.
Renaming
Just about the time we got all of this done, we settled on the new name and started applying it to the code base. That meant going into our now somewhat stable repos and breaking everything all over again. For about two weeks our lives have pretty much looked like this:
After a lot of tedious resolving of merge conflicts, we finally got ourselves back to stable.
Where we are now
As of this writing, both OpenSearch and OpenSearch Dashboards have all unit tests and integration test passing. We haven’t added any new features or even bug fixes yet. We’re still using version 7.10 as our version number, but we plan to reset that to 1.0 (you can see that work in progress in the versioning branch). We’re also still setting up CI nightly builds and signed artifacts, so keep an eye out for that.
The team (so far)
Finally, we wanted you to see some of the names you’ll start seeing on PRs and around the forums.
OpenSearch:
Abbas (abbashus), Charlotte (CEHENKLE), Harold (harold-wang), Himanshu (setiah), Nick (nknize), Rabi (adnapibar), Sarat (saratvemulapalli), Tianli (tlfeng)
OpenSearch Dashboards:
Anan (ananzh), Bishoy (boktorbb-amzn), Mihir (mihirsoni), Rocky (kavilla), Sean (seanneumann)
Infrastructure:
Barani (bbarani), Peter (peternied) and Peter (peterzhuamazon)
The journey has just begun…
So that’s what we’ve been up to. Our goal is to build the best open-source distributed search engine on the planet. But more importantly, we want that search engine to be built by a passionate and diverse community — so please join us. Submit pull requests, write documentation, open issues (either on OpenSearch or OpenSearch Dashboards), read the news and attend the community meetings.
Let’s go invent together.