Preparing OpenSearch and OpenSearch Dashboards for release

On January 21st, 2021, we started working on getting OpenSearch ready for public release. Some of us might already be familiar to you as contributors to Open Distro for Elasticsearch. Others are new to the project. Over time, we hope you’ll get to know all of us and we’ll get to know you.

As we’ve mentioned before, there was some setup work that needed to be done before we could make the repos public. It’s been an intense and exciting time. We’ve created and reviewed over 650 Pull Requests (PRs), made changes to 56,355 files, and deleted over 4,796,305 lines of code. The focus of all this work has been to create a code base that can serve as a solid open source foundation for us to build on together. Now that the day is finally here (did we mention it’s been intense?), we wanted to tell you more details about those changes.

Simple Things First

We initiated the fork from the 7.10 branches of Elasticsearch and Kibana. While all other branches and tags were ignored, we made sure to migrate the full git history of the 7.10 branch in order to retain proper attribution for the code’s original authors. This also let us make all future changes or modifications with full transparency. Once we’d created the initial fork, we set out to remove any code that wasn’t compatible with Apache License, 2.0 (ALv2).

The first step to preparing the code base for community release was to remove X-Pack code: git rm -rf ./x-pack . For auditability and transparency, we opened and reviewed a PR for each removal or change

While this single command removed all of the non-ALv2-compatible code, the codebase still contained numerous references to (now removed) x-pack code along with REST client functionality for features that aren’t available in the open source distribution. Similarly, Kibana’s source code and build infrastructure had tightly integrated references and workflow integrations with x-pack versus the open source builds. Simply removing the x-pack directory left the build of both code bases broken.

Removing X-Pack References

Even though the code for X-Pack features lived in a single directory, the X-Pack interfaces weren’t similarly compartmentalized. We could have retained these interfaces, but with the features removed the endpoints would simply fail. Since this would be confusing for users as well as require us to maintain unused code, we removed the following high-level-rest client interfaces from the open source code base (again, with each removal performed in a separate PR):

  • analytics
  • asyncsearch
  • ccr
  • common
  • enrich
  • eql
  • graph
  • indexlifecycle
  • license
  • migration
  • ml
  • rollup
  • security
  • transform
  • watcher

Similarly for the Kibana fork, each hard-coded X-Pack reference needed to be removed, and the broken references fixed. Once this was complete we began the effort to achieve a clean build of the code base by tackling any failing unit and integration tests.

One Elasticsearch “Flavor”, Multiple Distributions

The Elasticsearch build framework has a concept of different build “flavors” to support the build, packaging, and distribution of X-Pack features at various Elastic license and subscription tiers. This logic controls the “what” and “how” different licensed source code gets packaged on different platform distributions supported by the stack. OpenSearch has no concept of license tiers, so “build flavor” logic is unnecessary, and we removed it. We did retain the packaging and distribution logic for the various supported platforms. This ensures a consistent compatibility matrix without the need for controlling which source components are included, or excluded, in the distribution.

“Phone Home” Telemetry and Marketing Content

If a deployment of the Elasticsearch and Kibana stack (“ELK”) is connected to the internet, it collects and sends information about how you use it (known as telemetry) to Elastic. Telemetry collection is integrated within the core open source Kibana and Elasticsearch services, enabling it to gather different levels of user and cluster information (e.g., API calls, field types, aggregations). Once collected, that data is periodically uploaded to a telemetry server at Elastic.co. In Open Distro for Elasticsearch, we use configuration settings to turn telemetry collection off by default, and for OpenSearch we disabled the collection and posting of telemetry in the code itself. We left the now-inactive source code that performs telemetry intact so we as a community can collectively decide what to do with it going forward.

In addition to telemetry, the open source Kibana distribution contained an RSS newsfeed of product alerts and marketing updates. We turned off the newsfeed feature, and like with the telemetry implementation, we left the source code in place so we can decide about next steps together.

CI, “gradle check”, and End to End validation

The CI, test fixtures, backwards compatibility framework, and packaging implementation also relied on sending support artifacts and configuration settings to Elastic-hosted servers. To ensure the open source repository had a reliable CI infrastructure, we set up a new end to end automated testing infrastructure. The new CI infrastructure checks incoming pull requests and runs nightly builds and testing to catch and discover early regressions in the code base. The initial number of checks is quite small, but we plan to maintain, build on, and collaboratively improve this initial configuration over time.

Renaming

Just about the time we got all of this done, we settled on the new name and started applying it to the code base. That meant going into our now somewhat stable repos and breaking everything all over again. For about two weeks our lives have pretty much looked like this:

After a lot of tedious resolving of merge conflicts, we finally got ourselves back to stable.

Where we are now

As of this writing, both OpenSearch and OpenSearch Dashboards have all unit tests and integration test passing. We haven’t added any new features or even bug fixes yet. We’re still using version 7.10 as our version number, but we plan to reset that to 1.0 (you can see that work in progress in the versioning branch). We’re also still setting up CI nightly builds and signed artifacts, so keep an eye out for that.

The team (so far)

Finally, we wanted you to see some of the names you’ll start seeing on PRs and around the forums.

OpenSearch:
Abbas (abbashus), Charlotte (CEHENKLE), Harold (harold-wang), Himanshu (setiah), Nick (nknize), Rabi (adnapibar), Sarat (saratvemulapalli), Tianli (tlfeng)

OpenSearch Dashboards:
Anan (ananzh), Bishoy (boktorbb-amzn), Mihir (mihirsoni), Rocky (kavilla), Sean (seanneumann)

Infrastructure:
Barani (bbarani), Peter (peternied) and Peter (peterzhuamazon)

The journey has just begun…

So that’s what we’ve been up to. Our goal is to build the best open-source distributed search engine on the planet. But more importantly, we want that search engine to be built by a passionate and diverse community — so please join us. Submit pull requests, write documentation, open issues (either on OpenSearch or OpenSearch Dashboards), read the news and attend the community meetings.

Let’s go invent together.

23 Likes

Very nice, thank you for sharing this info.

Otis

2 Likes

Based on earlier comments from @searchymcsearchface, it sounds like you aren’t ready to work on governance documentation, but a good first step would be to take this list of people and put them into a maintainers doc with more details about what each person is responsible for. Here’s a good example of this from the Harbor project.

I’ve done a lot of work on governance for various projects, so please feel free to reach out if you have governance questions or would like help / feedback.

2 Likes

One more thing … it’s probably also worth creating a separate “community” repo where you can put all of things that are common across both projects, like community meetings, governance, guiding principles, etc. Here’s a nice, simple example from the Contour project.

2 Likes

Dawn,

Would you consider submitting pull requests for the maintainers doc with the info from above?

–David

2 Likes

@ke4qqq I would, but should someone create a community repo for this sort of thing first? Or should they be separate docs in the individual repos?

I think the current status makes separate docs in the individual repos the path of least resistance. I’ll chase getting another repo created, but I’m a piker in this particular community, and don’t have any whuffy to insta-create things.

1 Like

I like the name.
We can start migrate plugins to opensearch!!!
Good job to everyone.

1 Like

@aparo Yep. Stay tuned. LMK if you need any assistance getting your flatten plugin migrated over to OpenSearch.

@searchymcsearchface Are you trying to give me assistance? I like you!!! :grin: :grin:
I managed ES code forks and I coded ES plugins in the last 11 years.

The issues will no to migrate my code to OpenSearch (I’ll take some minutes - I’ve already forked the code and I building a local copy of OS).
Migrating the org.elasticsearch to org.opensearch could also break the clusterstate and the indices that are create with custom ES index encoder: I need to do some checks.

3 Likes

Here’s a PR for OpenSearch If this looks good and is merged, I can follow up with one for OpenSearch Dashboards.

3 Likes

@aparo I had to dig around to find someone with Scala experience, but expect some feedback.

@searchymcsearchface If you need something, simply ask me, also in pm.

Thanks @dawnfoster This sounds like great idea :slight_smile: we can for now do for the individual repositories.