Length Limit on _id

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): Future

Describe the issue:

I was recently reading the page on the bulk API and noticed that it restricts the value of _id to 512 bytes. That seemed very arbitrary and potentially very irritating for anyone who has a system in which the natural way of identifying a document is longer. The simplest such case I can think of is a filesystem or document store with deeply nested paths, or in which some files also have descriptive names, or indexing web content where the URL is a natural identifier.

One can work around this by having _id contain some form of a hash or assign and track GUIDS etc, and adding an additional “actual Id” field but it’s all a layer of indirection and potential confusion and additional engineering that seems unnecessary, so I decided to look into why this restriction was put in place. I was expecting to find some sort of interesting performance based argument.

I tracked it down to this issue: Elasticsearch should reject _id longer than the maximum URI length · Issue #16034 · elastic/elasticsearch · GitHub

In summary, it seems that this restriction was put in place to ensure that a URL including the value for _id could be pasted into or sent from a browser without running into the browser’s URL length limitations. There’s mention of some limit in the java client, but java itself doesn’t limit URL or URI’s as far as I can tell, and anything else should be under OpenSearch’s control. This StackOverflow seems to suggest that all of the important web browsers have long since fully supported MUCH longer URLs. The only remaining laggards are the Microsoft address bar and IE11 javascript.

Since the world has moved on, and become a better (longer URL) place, is it time to consider lifting or raising this limit?

@gus.heck Thank you for the post. This seems like a good candidate for a feature request. I would recommend to go ahead and raise it here