Querying for similar documents using Minhash

We have built an index of minhashes for our documents and we’ve also set up a query pipeline to generate a minhash of the document we want to search for but we are not sure how to use those values to build a document similarity query.

The documentation for the filter/analyser includes a python script to make use of the minhash values but I can’t find any documentation on how to use hash values in an opensearch query - how do I do a jaccard similarity query between the search document hash and the hashes in the index?

@tom_greasley As you might be aware, the Jaccard similarity score is calculated from the relation of the intersection and union of two sets.
As far as I’m aware, OpenSearch doesn’t have a built-in Jaccard function to calculate the similarity score. Therefore, an example Python script was used in the documentation.

Having the minhash in the index seems a little redundant without the ability to query with it. I just wanted to double check i’d not missed something.

It appears the ability to query using jaccard distance was added to solr (and I think lucene) shortly after the filter was added: