Issue with using a nested query combined with a non-nested query

I have an index with the following mapping (relevant sections only):

        "created" : {
          "type" : "date",
          "format" : "epoch_second"
        },
        "download_link" : {
          "type" : "nested",
          "properties" : {
            "domain" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "link" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },

And I want to run queries that check to see if a document has the same download_links so I can skip inserting duplicate documents. I have a query as follows:

{
"query": {
  "bool": {
    "must": [
      {
        "range": {
          "created": {
            "gte": 1568942017,
            "lte": 1663203261
          }
        }
      },
      {
    "nested": {
      "path": "download_link",
      "query": {
        "bool": {
          "must": [
            { "match": { "download_link.link": "<url>" } },
            { "match": { "download_link.link": "<url2>" } },
          ]
        }
      }
    }
    }
    ]
  }
}
}

The issue is that it’s returning results that don’t share the urls I queried for. Can someone help me figure this out?

Thanks!

Try using download_link.link.keyword instead. I think the problem is that the match query has an or operator by default and your download_link.link field is a text field (i.e. analyzed → produces multiple tokens). So any of the matching tokens will yield a match (I think that even “https” would match, so any link will do).

1 Like

Thanks! I truly appreciate it!

1 Like