Curious if anyone has successfully used Luke to explore the Lucene index within their OD deployment? We need to dive in at that level to troubleshoot a particularly pesky issue. Is one of the Luke binaries hiding within our OD setup by chance? Or, if anyone gone down this rabbit hole already, any tips or tricks you might be willing to share?
If you’re self-hosting your OpenSearch cluster, you can find the directory holding the Lucene files for each shard under a path like $OPENSEARCH_ROOT/data/nodes/0/indices/sQ3gze7NQe2IsN1yOyyAlw/0/index. You can copy that directory somewhere else and hit it with Luke.
The catch is that (depending on the version) the index name may get mapped to an internal gibberish value (like sQ3gze7NQe2IsN1yOyyAlw). The good news is that you can resolve from the index name to the UUID using the /_cat/indices API, like:
GET /_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open security-auditlog-2023.02.24 h1dNpm6iSdCLOSij_Utf2g 1 1 2 0 25.3kb 25.3kb
yellow open security-auditlog-2022.11.28 sQ3gze7NQe2IsN1yOyyAlw 1 1 3 0 55.9kb 55.9kb
You can figure out which node holds which shards using the /_cat/shards API.
So, the basic steps are:
Find the UUID for your index using /_cat/indices.
Use /_cat/shards to figure out which nodes hold the relevant shards.
For each relevant shard, log onto the appropriate node and find the Lucene directory under $OPENSEARCH_ROOT/data/nodes/<node_num>/indices/<UUID>/<shard_num>/index.
Copy the Lucene directory somewhere else.
(Probably) You’ll need to delete the write.lock file from the copy. (Never delete it from the directory managed by OpenSearch!)