Help with an and query match

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

Opensearch: 1.3.3
Dashboard: 1.2.0
Server: Ubuntu 22.04 LTS server
Browser: Chrome 112

Describe the issue:
I’ve got a document with download links like this:

          "download_link" : [
            {
              "link" : "https://domain1.com/path/to/file",
              "domain" : "domain1.com"
            },
            {
              "link" : "https://domain2.com/path/to/file",
              "domain" : "domain2.com"
            }
          ],

What I’m looking to do is to create a query that matches BOTH links. I have the following:

{
  "query": {
    "bool": {
        "should" : [
          {
           "term": {
             "download_link.link.keyword": "https://domain2.com/path/to/file"
           }
         },
         {
           "term": {
             "download_link.link.keyword": "https://domain1.com/path/to/file"
           }
         }
       ],
       "minimum_should_match": 2
     }
   }
}

But this returns no results. My guess is because the first link is not equal to both links.

Can someone help me out with this query?

Thanks!

Hi @thedraketaylor

Could you please send your query request which can be sent in DevTools?

Documents that do not match the queries in a should clause are returned as hits too. Do you need all hits or only hits that match your download_link value?

Hi @Eugene7 ,

Here’s my current query. I did change the should to a must. I want only those documents that have both links to be returned. I think I’m on the right track, just need a little push in the right direction…


{
  "query": {
    "nested": {
      "path": "download_link",
      "query": {
        "bool": {
          "must": [
            {
              "term": 
                { 
                  "download_link.link.keyword": 
                    { 
                      "value": 
                        "link1"
                    } 
                } 
              },
            {
              "term":
                { 
                  "download_link.link.keyword":
                    { 
                      "value": 
                        "link2"
                      
                    } 
                  
                } 
              
            }
          ]
        }
      }
    }
  }
}

Thanks!

I also need an example of your document. Could you please send me a response for the query below?
Please change or delete any sensitive data.

GET your-index-name/_search
{
  "size": 1, 
   "query" : {
        "match_all" : {}
    }
}

Hi @Eugene7,

I have to sanitize the contents as it’s NSFW, but here is a document:

      {
        "_index" : "forumpost-test",
        "_type" : "_doc",
        "_id" : "1aqWTIIBhi7v7dX3NYnH",
        "_score" : 1.0,
        "_source" : {
          "id" : "ysVxTIIBZYfErf_W4dWQ",
          "username" : "username1",
          "username_link" : "https://domain1.com/path/to/user",
          "post_number" : "20121557",
          "thanks" : "",
          "post_images" : [
            {
              "link" : "https://link2.com/path/to/file",
              "image" : "https://link2.com/path/to/image",
              "domain" : "link2.com"
            },
            {
              "link" : "https://link1.com/path/to/file",
              "image" : "https://link1.com/path/to/image",
              "domain" : "link1.com"
            },
          ],
          "download_link" : [
            {
              "link" : "https://link2.com/path/to/file",
              "domain" : "link2.com"
            },
            {
              "link" : "https://link1.com/path/to/file",
              "domain" : "link1.com"
            },
          ],
          "domain" : "https://domain1",
          "post_url" : "https://domain1.com/path/to/post",
          "scraped_time" : 1658939637,
          "post_text" : "text from the post"
        }

Also, Here is the mapping for the index:

{
  "forumpost-test" : {
    "mappings" : {
      "properties" : {
        "created" : {
          "type" : "date",
          "format" : "epoch_second"
        },
        "domain" : {
          "type" : "keyword"
        },
        "download_link" : {
          "type" : "nested",
          "properties" : {
            "domain" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "link" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "id" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "post_images" : {
          "type" : "nested",
          "properties" : {
            "domain" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "image" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "link" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "post_number" : {
          "type" : "integer"
        },
        "post_text" : {
          "type" : "text"
        },
        "post_url" : {
          "type" : "keyword"
        },
        "scraped_time" : {
          "type" : "date",
          "format" : "epoch_second"
        },
        "thanks" : {
          "type" : "integer"
        },
        "title" : {
          "type" : "text"
        },
        "username" : {
          "type" : "keyword"
        },
        "username_link" : {
          "type" : "keyword"
        }
      }
    }
  }
}

The query below will find documents that have “download_link.link”: “https://link2.com/path/to/file” and “download_link.link”: “https://link1.com/path/to/file”. Is this solution good for you?

GET forumpost-test/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "match_phrase": {
            "download_link.link": "https://link2.com/path/to/file"
          }
        },
        {
          "match_phrase": {
            "download_link.link": "https://link1.com/path/to/file"
          }
        }
      ]
    }
  }
}

Thank @Eugene7,

But sadly, that doesn’t work for me.

I run that query, but I get no results

I’m getting good luck with this query:

GET forumpost-test/_search
{
  "query": {
    "nested": {
      "path": "download_link",
      "query": {
        "bool": {
          "should": [
            { "match": { "download_link.link.keyword": "<link 1>" }},
            { "match": { "download_link.link.keyword":  "<link 2>" }} 

          ],
          "minimum_should_match": 1
        }
      }
    }
  }
}

I know it’s using should, but the few tests I’ve run, It seems to be working. I still want to get to working with “must” though

I think you’ll need a bool->must with two clauses. Both of them would be nested queries wrapping - I would use a term query here since it’s an exact match - download_link.link.keyword, one for each link.

In plain English, you’ll search for “forumpost” documents that have both a “download_link” child that points to link1 and another one to link2. I think your last one doesn’t work if you change to must because you’re asking for forumposts with a child that has both links. But you have one link per child.

Alternatively, you can pull the links out in the main “forumpost” as an array and then you can do a must for both values on that field. It would be much faster, but then maybe you’d want to build more complex queries that don’t work now.

Hi @radu.gheorghe,

Thank you! I think this does the trick:

GET forumpost-test/_search
{
    "query": {
        "bool": {
          "filter": {
            "bool": {
              "must": [
              	{
		            "nested": {
		              "path": "download_link",
		              "query": {
		                "bool": {
		                  "must": {
		                    "term":{ "download_link.link.keyword": "link 1"}
		                  }
		                }
		                
		              }
		            }
              	
              	},
              	{
		            "nested": {
		              "path": "download_link",
		              "query": {
		                "bool": {
		                  "must": {
		                    "term":{ "download_link.link.keyword": "Link 2"}
		                  }
		                }
		                
		              }
		            }
              	
              	}
              	
              ]
            }
          }
		    }
      }
}
1 Like