OpenSearch Client Java & Python APIs

Disclaimer: This is not legal advice, I am not a lawyer. I have worked in the open source legal space for 15+ years and worked closely with Red Hat Legal in my previous job.

@erickg, what you see in the README.md is what I would expect to see when you fork another project. They are attributing the copyright holder on the original work, and documenting the license that the work is under.

As you add/modify code to your fork, their copyright statement (and license for their copyrighted changes) still apply, so you need to be sure to retain that attribution (in README.md and wherever it appears in the code files, probably in the comment header). What you can do is append your own copyright statement, like this:

Copyright 2021 Elasticsearch B.V. 
Copyright 2021 ErickG

Licensed under the Apache License, Version 2.0

You can definitely do this in README.md without issue, and you can make this change to any source files you modify. If you create entirely new files (that do not copy content from existing files), you do not need to include the Elasticsearch copyright attribution statement.

I am assuming, for simplicity, that your fork intends to keep the Apache License, Version 2.0, that you inherited from the upstream fork. It is possible for your changes to be under a different license, but it complicates things (including my answer) quite a bit, so my advice to you would be to keep your fork Apache 2.0.

If you have additional Copyright or License questions, please feel free to ask me, and I will do my best to help you.

1 Like

Thanks, that’s helpful. Of course, I would love to have Apache License.

I have a question about source repo elasticsearch-py’s license. It says

Copyright 2021 Elasticsearch B.V. Licensed under the Apache License, Version 2.0.

This doesn’t align with APL declaration in elasticsearch-py/LICENSE at master · elastic/elasticsearch-py (github.com). Does that mean the code is not APL anymore?

I’m not sure I follow 100% - copyright and license are two separate things. The repo you linked looks like Apache to me.

@spotfoss Thoughts?

The file that you linked to is a copy of the Apache License 2.0. I am also unsure where your confusion is coming from, as this matches the statement in README.md.

IMHO, the actual python API sucks because are very 2.x legacy designed.
They don’t follow the actual approach of using Python Typed for methods and objects.
It’s ok to maintain for old created code, but for new one it should better to move a more python modern approach.
The same if for Jaa API that the High-Level are poor of entity model design.

1 Like

Thanks. It makes sense to me now. Then the library can be licensed as Apache License 2.0 with updated copyright holder when it moves on.

I’d be more than happy to be able to use more python 3.6 features and onwards if possible. To start, I would aim for being able to make API requests to OpenSearch 1.0 without issues.

I spent some time looking at python client codebase. There are different hacks to make it work as of today.
Since elasticsearch-py 7.* should be able to work with the 1.0. I think it doesn’t matter for me to break compatibility between two client projects. I aim to build the client to talk to OpenSearch 1.x and drops Elasticsearch compatibility when not possible. Correct me if I misunderstood.

I see some areas to improve:

  • Use AST to generate functions instead of using templates.
  • See if I can composite API classes after the above point.
  • Then the package can go for native typing hints.
    • Start support py 3.6 at least.
  • Change API function signature to requests style. Basically, drop query_params support which extracts kwargs to params. That confused the most why the client works differently comparing to HTTP requests in Kibana console.
    • Then I don’t need to rename type to doc_type for URL query parameters.
  • Drop XPack API
  • Annotate network modules
  • A lot of renaming in comments and documentations
  • Type hints.
  • Changes with OpenSearch
    • Bulk ingestion errors are very difficult to know
    • lz4 compression (this needs clusters support)
    • Probably better to have a pipeline with OpenSearch sooner than later. There are tests that are done together with Jenkins pipeline for elasticsearch-py that requires a cluster.

What do you think it’s important for you? Any other thoughts?

That looks like a great list. A few things to consider:

  1. Keep an eye on the project roadmap. The project uses semantic versioning, so no breaking changes are expected in 1.x but at 2.0 you’ll start to see some breaking changes (as an example, mastermain or whatever the new term ends up being).
  1. The OpenSeach plugins could probably have an native API in the client
  2. I’m not super familiar with this aspect of the python client, but having extensibility for additional plugins would be really helpful.

I’m try to do many of them in my client fork [GitHub - aparo/opensearch-client-generator: OpenDistro Client code generator to be used with Elasticsearch].
I removed xpack stuff.
I had a working typed AST in scala to be used to generate code i different languages.
I want:

  • target python 3.6 or above.
  • Asyncio by default (with tips to call old python blocking style code)

I hope to be able to work on it on night/spare time ;-(

1 Like

Hmm, maybe I can wait for a client build from you. I can check what features or transport layer are missing from the generated client.

@searchymcsearchface Most of the client code is generated from a set of spec files from the upstream build.

  1. The OpenSeach plugins could probably have an native API in the client
  2. I’m not super familiar with this aspect of the python client, but having extensibility for additional plugins would be really helpful.

As long as the spec is generated somewhere, a generator can generate client code from those spec. I am not sure how the spec is generated or written. This seems belong to OpenSearch repo.

  1. Keep an eye on the project roadmap. The project uses semantic versioning, so no breaking changes are expected in 1.x but at 2.0 you’ll start to see some breaking changes (as an example, mastermain or whatever the new term ends up being).

I think @aparo and I both try not to break compatibility to talk to OpenSearch cluster. But we want to ask users to change their code using the new client library. As of how much change users have to make, that can be discussed. All in all, elasticsearch-py 7.x should be able to talk with OpenSearch since day 1. It sounds like a good enough compromise.

Plus one to Asyncio!

Is there anyone looking into also migrating the Elasticsearch DSL Python library yet?

The Elasticsearch Python client just got a product check built-in, that will even prevent v7.14+ from being used with e.g. the 7.10 oss-distribution, hence forcing people to use 7.11 and thereby Elastic License v2…

Can you send a link to the PR that breaks it?

Here’s the related PRs and issues:

1 Like

The NodeJS library has the same “feature” [Backport 7.x] Verify connection to Elasticsearch by github-actions[bot] · Pull Request #1497 · elastic/elasticsearch-js · GitHub

Thanks for calling these out. :expressionless:

1 Like

Hi,

Has there been any work on a python GitHub - elastic/elasticsearch-py: Official Elasticsearch client library for Python yet?
I either need to pull the previous version before the ‘license check’ or move to a new lib.

Thanks,

Tony.

Yep. Work is happening but it’s going to a few weeks until the client libs are public (renaming and updating license headers requires both engineering and legal review). Yesterday the first non-AWS committer was invited to a client lib, which should help the velocity.

Until then use the client version right before the check was implemented (the highest 7.13.x usually) - see the documentation Compatibility - OpenSearch documentation .

2 Likes

Python clients (dsl and low level) have been released

repos

2 Likes