@aparo This is good news. However, does it maintain the same Java/Python APIs and signatures? In other words, if we have existing code that uses current elasticsearch Java/Python APIs would it continue to work if we use these new OpenSearch Java/Python clients?
@aparo I tried to generate python client from OpenSearch repo. It was successful. I believe types can be added but typing depends what versions the package should support. I would recommend putting up compatibility requirements. IMO, I would drop python 2 support completely. And requires 3.7+ since aiohttp support 3.7+.
The documentation still links to elastic.co. If kept unchanged, I believe that will be difficult to align client users when OpenSearch is released.
Before discussion work, I need some help with license. The README.md stated Copyright 2021 Elasticsearch B.V. Licensed under the Apache License, Version 2.0. LICENSE file says Apache License. This is confusing to me.
Disclaimer: This is not legal advice, I am not a lawyer. I have worked in the open source legal space for 15+ years and worked closely with Red Hat Legal in my previous job.
@erickg, what you see in the README.md is what I would expect to see when you fork another project. They are attributing the copyright holder on the original work, and documenting the license that the work is under.
As you add/modify code to your fork, their copyright statement (and license for their copyrighted changes) still apply, so you need to be sure to retain that attribution (in README.md and wherever it appears in the code files, probably in the comment header). What you can do is append your own copyright statement, like this:
Copyright 2021 Elasticsearch B.V.
Copyright 2021 ErickG
Licensed under the Apache License, Version 2.0
You can definitely do this in README.md without issue, and you can make this change to any source files you modify. If you create entirely new files (that do not copy content from existing files), you do not need to include the Elasticsearch copyright attribution statement.
I am assuming, for simplicity, that your fork intends to keep the Apache License, Version 2.0, that you inherited from the upstream fork. It is possible for your changes to be under a different license, but it complicates things (including my answer) quite a bit, so my advice to you would be to keep your fork Apache 2.0.
If you have additional Copyright or License questions, please feel free to ask me, and I will do my best to help you.
The file that you linked to is a copy of the Apache License 2.0. I am also unsure where your confusion is coming from, as this matches the statement in README.md.
IMHO, the actual python API sucks because are very 2.x legacy designed.
They don’t follow the actual approach of using Python Typed for methods and objects.
It’s ok to maintain for old created code, but for new one it should better to move a more python modern approach.
The same if for Jaa API that the High-Level are poor of entity model design.
I’d be more than happy to be able to use more python 3.6 features and onwards if possible. To start, I would aim for being able to make API requests to OpenSearch 1.0 without issues.
I spent some time looking at python client codebase. There are different hacks to make it work as of today.
Since elasticsearch-py 7.* should be able to work with the 1.0. I think it doesn’t matter for me to break compatibility between two client projects. I aim to build the client to talk to OpenSearch 1.x and drops Elasticsearch compatibility when not possible. Correct me if I misunderstood.
I see some areas to improve:
Use AST to generate functions instead of using templates.
See if I can composite API classes after the above point.
Then the package can go for native typing hints.
Start support py 3.6 at least.
Change API function signature to requests style. Basically, drop query_params support which extracts kwargs to params. That confused the most why the client works differently comparing to HTTP requests in Kibana console.
Then I don’t need to rename type to doc_type for URL query parameters.
Drop XPack API
Annotate network modules
A lot of renaming in comments and documentations
Type hints.
Changes with OpenSearch
Bulk ingestion errors are very difficult to know
lz4 compression (this needs clusters support)
Probably better to have a pipeline with OpenSearch sooner than later. There are tests that are done together with Jenkins pipeline for elasticsearch-py that requires a cluster.
What do you think it’s important for you? Any other thoughts?