I’m investigating deploying a production Open Distro For Elasticsearch cluster, and want to make sure with all the renaming/reorganization that’s taking place, that this cluster will be cleanly upgradable to the new OpenSearch suite, whatever it will look like.
For example, the debian install page has steps to manually install an elasticsearch deb from elastic.co, and to install a package named opendistroforelasticsearch (which IIRC is a temporary name). I assume that these are temporary-ish install instructions, and that a more formal install procedure will be put into place in the future, and I want to make sure that any servers installed now will be upgrade-compatible with new releases in the future.
Also, wanted to say thanks to this project and its contributors – ODFE has worked great so far in my testing.
Let’s back up a bit. Open Distro for Elasticsearch has existed since 2019. Open Distro does use OSS Elasticsearch and Kibana and adds in open source plugins. The install page you are referencing is correct for a production ready install today.
OpenSearch is a forked version of Elasticsearch and Kibana (as OpenSearch Dashboards) and will include the plugins used in Open Distro. Right now, it’s alpha and there are no artifacts yet. When the project reaches beta stage, there will be artifacts and an install guide. But still not production ready.
I expect general availability in late summer (mid-ish 2021). You should be pretty safe going from Open Distro to OpenSearch. Most of the changes are internal and/or cosmetic, so as a user I would expect it to be an easy upgrade path. Expect more on upgrade paths as the project gets into the later stages of development.
Got it, thanks for the clarification between ODFE and OpenSearch.
One other question, does OpenSearch plan to remain API-compatible with Elasticsearch, at least with respect to core Elasticsearch functionality (excluding x-pack stuff)? In other words, should we expect any application built to hook into an Elasticsearch server to function with OpenSearch?
I would expect it would - any application that does things like checking version compliance might have a problem though as OpenSearch will report differently.
If you are planning on a relatively small system, say a couple of data nodes and a master node, opendistro will be fine.
If, otoh, you are planning a major 500m+ doc system, requiring the full gambit of node types, data, master,ingest,coordinating, hot.war/cold/frozen etc etc, opendistro is not production ready as at v7.10 imvho.
There is also teh issue of the AWS penchant for Blue/Green updates (making copies of the whole cluster, then swapping over to the new version), as opposed to a 'rolling upgrade, which allows you to keep the whole cluster live while updating. Opendistro kindda requires the former, and is harder with the latter.
The current version (7.10) of opendistro does not support the PIT (Point in Time) scroll methodology on long queries.
The issue with Opensearch, will be similar to the AWS managed service, which is geared towards selling machines, and not towards flexibility.
I have wasted a month with a client, trying to get opendistro wo function in any way effectively at teh base level. We have now swapped back, as we need solid systems for multi billion pa doc stores.
If any of this raises alarm bells, then you should try sorting it with your chosen flavour… if you are on a small system (small is less than 250m docs) , the opendistro will be just fine.
This is just… Wrong.
I am using the latest ODFE for 50B+ documents and growing, have done a rolling upgrade several times by now and am having a much easier time than when I was using plain ES.
Our system is also used for massive real time analasys and indexing with thousands of different fields.
I have no idea what are you on about, but ODFE has been a blessing as far as we’re concerned for the past year and a half or so.
Thats really cool, and pleased its working for you, May I ask what your setup is?
How many of each of these:
Masters (3+ obviously).
Data nodes
Ingest nodes
Coordinating (Query) nodes
Thats where I hit issues with our ODFE recently. I have also setup major systems … 33Bn pa being the largest to date, but could not get ODFE to do the right thing.
AWS service does a BLue/Green update… I have had direct experience of it locking us out on a major system, and also spoken directly with the developers on this. It has improved a bit recently, but still does a Blue/Green, with a smaller lockout time.
PIT did not exist as at 3 weeks ago… maybe thats been added now, wheres it does under Elastic OSS, and is far more exact than scroll… fits my use case better.
I am really glad that someone has it working fine for their setup… so thats great.
We are actually able to utilize it with barely 3 nodes (all are M/C/D), using ZFS as the underlying FS and deviating a bit from the standard advice to 100 GB xmx, 300 GB for ZFS ARC cache and the rest kept for overhead.
It’s based on a few Raidz2 vdevs (each four disk wide) under one pool in each node.
We had an earlier cluster with 10 data nodes + 3 Master nodes with 64 GB/12 cores each which did just as well with some different configuration.
We’ve found ODFE to be quite versatile in the process
Adding on top of that, we’ve been able to achieve a performance of 1B+ documents indexed in 2 hours with this setup with no failures or queue on real workload. Which is what led us to deviation from the standard recommendations by Elastic
Right… thats roughly what I expected. Sounds like a mean setup, but obviously built around your use case for optimisation.
I have a need to fully utilise teh main node types, with probably warm and cold data nodes also. We will also have ingestion and query points geographically distant… globally… so have a real need to ensure that data nides dont have to deal with major ingestion, nor queries… only on their own data. Similarly we will need different levels of access in different areas, so expansion of each type is paramount.
This is where ODFE let me down. At least on teh v7.10 I was trying out, For example, I could not have an Ingest node on its own. It had to be a Data node also. SImilarly, for some weird reason, each data node I set up, it decided it would also be a mater node, which was just plain weird.
I spend 2 weeks playing with teh configs… then just gave up, and went back to elastico, which I setup in a weekend.
I first usd ES with v0.9 (if I remeber), with a 15-20 datanode cluster over 3 datacentres, in 3 continents… it worked just fine. All 64 Gb machines. I have also used AWS managed service, and never understodd that BLue/Green update process… ES (like MOngo)is an ‘always up’ system when done right.
Hey ho… as I say… you have found a use case where ODFE works for you… I have found when where it is distinctly sub-optimal for me. Thats software for you
However… thanks for the info on your setup. 1B+ every 2hrs is not inconsiderable. COngrats on that
It’s possible that critical bug/security fixes will happen for Open Distro, but it’s not something planned, so each situation will be a case-by-case basis if it would trigger a release