How to find idf of a term relative to a subset of documents

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

OpenSearch 2.11

Describe the issue:

Is there a way to retrieve the idf only relative to a subset of documents, not the entire index?

I’m using the query below. It returns the idf(term-value) in the results, but it gets computed relative to the entire index, not just relative to the docs in the interval [2024-01-01, 2024-02-01].

GET my-index/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "createTime": {
              "gte": "2024-01-01",
              "lte": "2024-02-01"
            }
          }
        }
      ],
      "must": [
        {
          "match": {
            "textField": "term-value"
          }
        }
      ]
    }
  },
  "explain": true,
  "_source": false
}

Thank you!

I think the two filters run independently and then the results are merged, so we don’t have a way to limit the value of idf to the documents matched in the first filter. And idf is shard-level by default, it can only be changed to index level. Maybe you can try to create index monthly to ensure that the index only contains the documents in that interval.

1 Like

This could work, thanks @gaobinlong.