Hello,
Thank you for checking out my post. I need some help writing a an aggregation query! I started off using the docs on opensearch.org under the search experience. I used the edge-ngram-filter as described.
Everything is going great. But at the end, I want to group/aggregate the results. This is where I need help. The last phase.
I’m using OpenSearch to index UK postcode data as part of a auto-complete. Here’s the what the data looks like:
{
"address_1": "1",
"address_2": "Ashwood Park",
"address_3": "Bridge Of Don",
"address_4": "ABERDEEN",
"postcode": "AB22 8PR",
"point": "POINT(57.200058,-2.122866)",
"latitude": "57.200058",
"longitude": "-2.122866"
}
There are about 33 millions docs.
Using some indexing examples from the OpenSearch.org website, here is the index I created filters for “edge-ngram-filter” and “lowercasing”:
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "edge_ngram_filter"]
}
},
"filter": {
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 12
}
}
}
}
},
"mappings": {
"properties": {
"address_1": {
"type": "text",
"analyzer": "autocomplete"
},
"address_2": {
"type": "text",
"analyzer": "autocomplete"
},
"address_4": {
"type": "text",
"analyzer": "autocomplete"
},
"postcode": {
"type": "text",
"analyzer": "autocomplete"
},
"point": {
"type": "geo_point"
},
"latitude": {
"type": "double"
},
"longitude": {
"type": "double"
}
}
}
}
My assignment requires me to search over 4 or 5 “weighted” fields:
- “postcode^8”
- “address_2^4”
- “address_3^2”
- “address_4^6”
Therefore, I am using a multi_match query body as follows:
{
"from": 0,
"size": 100,
"query": {
"multi_match": {
"query": "AB15 8PS",
"fields": [
" postcode^6",
" address_2",
" address_3^2",
" address_4^4"
]
}
},
"highlight": {
"fields": {
" postcode": {},
" address_4": {},
" address_3": {},
" address_2": {},
}
}
}
Everything works until this point. the results come back with each document. But I need to group the _source docs and supply a unique postcode count for each grouped results.
[
{
"_index": "postcode-index",
"_type": "_doc",
"_id": "53926",
"_score": 66.87001,
"_source": {
"address_1": "5 Small Holdings Whitemyres",
" address_2": "",
" address_3": "Kingswells",
" address_4": "ABERDEEN",
" postcode": "AB15 8PS",
" point": "POINT(57.148816,-2.192512)",
" latitude": "57.148816",
" longitude": "-2.192512"
},
"highlight": {
" postcode": ["<em>AB15</em> <em>8PS</em>"]
}
},
{
"_index": "postcode-index-example4",
"_type": "_doc",
"_id": "53927",
"_score": 66.87001,
"_source": {
"address_1": "6 Small Holdings Whitemyres",
" address_2": "",
" address_3": "Kingswells",
" address_4": "ABERDEEN",
" postcode": "AB15 8PS",
" point": "POINT(57.148816,-2.192512)",
" latitude": "57.148816",
" longitude": "-2.192512"
},
"highlight": {
" postcode": ["<em>AB15</em> <em>8PS</em>"]
}
}
]
However, I need to group the results, and send back 10 “grouped” results, and the highlight, along with a unique count of postcodes for each grouping. Here is an example output that I am trying to create:
I could really use the help, ASAP. Happy to work with a contractor if necessary.
If you don’t know, please pass this on to someone you think might know?
Thank you!