Multi-lingual search

Hi,

I’m trying to have multi-lingual search, as there is no restriction across my application on language.

Right now what I’ve is a Full-text search with some fuzziness. But it would be great if I can have something like as depicted in this video.

So that when someone is searching for cancel or annuler (means cancel in french) all the documents having any of these words will be a hit.

I’m using opensearch v1.2

Thanks for any advice, regarding this.

Hi @cicada ,

I think what you might like is be able to use “Multilingual Universal Sentence Encoder” for indexing and querying in OpenSearch. If I understand it correctly this would give you what you are looking for. There is a precomputed model for 16 languages available under friendly license (AL2): TensorFlow Hub

I would expect that in the end this could be some combination of use of kNN search and model specification in ML plugin.

But I am not sure if this is possible OOTB with OpenSearch now. Perhaps someone more familiar with the ML plugin can chime in. However, if this is not possible now it would be definitely a great topic to explore.

Edit: I noticed you are running OS 1.2, in which case you might be limited on ML plugin side. You would be probably better off updating to more recent OS version.

Regards,
Lukáš

1 Like

Hi @lukas-vlcek ,

Thanks for your reply. I did stumbled across interesting topics like this one.

As you’ve already mentioned having OOTB solution for OS 1.2 is less likely. But if I can get the vectors for searchable text before, let’s say through another service call, then I can index that vector data along with the document and will be able to use KNN, right?