Server side clustering of geo-points on a map with AWS OpenSearch

Hi,

We have an index in OpenSearch with 100k records that contain location info. We want to show all of them on a map in a website using clustering. What would be the best practice in doing so?
Here are some online resources we have been reading:

Initially we tried to render the clusters in frontend. That is - fetching data from OpenSearch with a query and sending to browser where a JavaScript library would render it but once the location count goes over 20-30k the time it takes to pull locations from OS becomes 10-15sec. The possible solution we are seeing is to render clusters right in the OpenSearch and only send result data from OS to client browser.

Hopefully someone can offer some guidance.

Thanks,
Klavs

Hi,
We are looking into this. Please provide us sometime to look into this.

You might want to look into this documentation:

OpenSearch also support GeoTile Aggregation on GeoPoints, as of now I am not able to find documentation around it. I will put an example for the Tile Aggregation.

Hi @Klavs ,
You can use geohash to aggregate or cluster documents into a bucket and show only the count of the documents in every cluster. This will significantly reduce latency since you are not fetching all documents from OpenSearch.

For higher precision or higher zoom level, make sure that you are providing a filter using geo_bounding_box, like below, else, it will create many small cells with millions of bucket.

POST /index_name/_search?size=0
{
  "aggregations": {
    "higher_zoom": {
      "filter": {
        "geo_bounding_box": {
          "location": {
            "lat": 83.76,
              "lon": -81.2
            "bottom_right": "POINT (3.0 42.2)"
          }
        }
      },
      "aggregations": {
        "1": {
          "geohash_grid": {
            "field": "location",
            "precision": 8
          }
        }
      }
    }
  }
}

The response will have list of buckets with key as geo_hash value and count of documents on each bucket.
If you want to show those aggregation value as geo_point, you can use geo_centroid as sub aggregation within the bucket like below, this will calculate centroid based on all geo_points within the bucket, which can be used later to show it on the maps.

{
  "aggs": {
    "filter_agg": {
      "filter": {
        "geo_bounding_box": {
          "ignore_unmapped": true,
          "location": {
            "top_left": {
              "lat": 90,
              "lon": -180
            },
            "bottom_right": {
              "lat": -90,
              "lon": 180
            }
          }
        }
      },
      "aggs": {
        "1": {
          "geohash_grid": {
            "field": "location",
            "precision": 3
          },
          "aggs": {
            "2": {
              "geo_centroid": {
                "field": "location"
              }
            }
          }
        }
      }
    }
  },
  "size": 0,
  "excludes": []
  }
}

The above is available as part of Coordinate Map in OpenSearch Dashboards. I would recommend experiment your use case using Coordinate map. If you are interested in query that is used by Dashboard, you can use inspect to understand request/response as well.

If your use case also includes fetch documents for a given bucket and display it on Maps, we could provide more details on that too.

Please let us know if you have any more questions.

Hey @Vijay,
thank you very much for your example.

I have a similar case, but with necessity to add additional aggregation to add more info to the points:

  • 300-500k of rows with coordinates in Opensearch
  • need to show as clusters on the map
  • need to add ids which belong to each point
  • search.max_buckets: 65536

However when I want to aggregate clusters for the whole map (like in your example 90, -180 - -90, 180), I get ‘too_many_buckets_exception’ error.
I don’t have an option to change search.max_buckets value.
Furthermore I want to make the correct request not to overload Opensearch.

Would you be so kind to help in how can I achieve such functionality with the following query?
Thank you!

{
  "aggs": {
    "filter_agg": {
      "filter": {
        "geo_bounding_box": {
          "ignore_unmapped": true,
          "location": {
            "top_left": {
              "lat": 90,
              "lon": -180
            },
            "bottom_right": {
              "lat": -90,
              "lon": 180
            }
          }
        }
      },
      "aggs": {
        "1": {
          "geohash_grid": {
            "field": "location",
            "precision": 3
          },
          "aggs": {
            "2": {
              "geo_centroid": {
                "field": "location"
              }
            },
            "3": {
              "terms": {
                "size": 10000, #maximum aggregation size for terms
                "field": "id"
              }
            }
          }
        }
      }
    }
  },
  "size": 0,
  "excludes": []
  }
}