Rank based on rarity of a field value

Hi :vulcan_salute:

I’d like to know how can I rank lower items, which have fields that are frequently appearing among the results.
Say, we have a similar result set:

"name": "Red T-Shirt"
"store": "Zara"

"name": "Yellow T-Shirt"
"store": "Zara"

"name": "Red T-Shirt"
"store": "Bershka"

"name": "Green T-Shirt"
"store": "Benetton"

I’d like to rank the documents in such a manner that the documents containing frequently found fields,
“store” in this case, are deboosted to appear lower in the results.
This is to achieve a bit of variety, so that the search doesn’t yield top results from the same store.

In the example above, if I search for “T-Shirt”, I want to see one Zara T-Shirt at the top and the rest
of Zara T-Shirts should be appearing lower, after all other unique stores.

So far I tried to research for using aggregation buckets for sorting or script sorting, but without success.
Is it possible to achieve this inside of the search engine?

Many thanks in advance!

Hello,

I don’t think you can get this exact result natively, but there are some options that are close enough, IMO. Here’s one:

  • you can collapse search results, for example to show one T-shirt per unique store
  • on a second query, you can show the rest, maybe excluding the ones you already showed

We can think of others that are similar (e.g. the use of the top_hits aggregation). Note that in general, the default similarity de-boosts words that appear more often, if they match words from your query. So if you’d search for “zara OR berkshka”, then Bershka T-shirts will come on top, because they’re “more specific” to your query. But if you just want variance in search results, then you’ll want to do some field collapsing or maybe inject some random score via the function score query: Function score query | Elasticsearch Guide [7.10] | Elastic

1 Like