Skip to main content

Aggregation

DistanceBased

The DistanceBased class aggregates point data using distance-based clustering, which is a simple technique that groups points that are within some given distance threshold. If this distance threshold is 30 kilometres, for example, then each pair of points that are no more than 30 kilometres apart will be considered part of the same cluster. Once these clusters are established, their constituent variables are aggregated.

Importing the Class

The DistanceBased class is located in GeoJikuu's aggregation.point_aggregators module:

from geojikuu.aggregation.point_aggregators import DistanceBased

Coordinate Projection

GeoJikuu's aggregation classes assume that any input coordinates have already been projected to a linear coordinate system. In addition, the DistanceBased class's aggregate() function requires the input distance to be in the same unit as the chosen projection system, so the projection system's unit conversion will be needed as well. For example:

from geojikuu.preprocessing.projection import CartesianProjector
cartesian_projector = CartesianProjector("wgs84")

data = {
    "lat": [34.6870676, 34.696109, 34.6525807, 35.7146509, 35.6653623, 35.6856905],
    "lon": [135.5237618, 135.5121774, 135.5059984, 139.7963897, 139.7254906, 139.7514867],
    "value": [1, 2, 1, 5, 6, 3]
}

df = pd.DataFrame.from_dict(data)

results = cartesian_projector.project(list(zip(df["lat"], df["lon"])))
df["cartesian_coordinates"] = results["cartesian_coordinates"]
unit_conversion = results["unit_conversion"]
df.head()
latlonvaluecartesian_coordinates
034.687068135.523762 1(-0.5867252094096281, 0.5760951446437298, 0.5690939403658662)
134.696109135.5121772 (-0.5865446454704655, 0.5761508222375707, 0.5692236896905267)
234.652581135.5059981(-0.5867908125787922, 0.5765169810552485, 0.5685989032948122)
335.714651139.7963905(-0.6201191587857982, 0.5241083019965597, 0.5837488472665938)
435.665362139.7254916(-0.6198530442409449, 0.5251996831772221, 0.5830501662256676)
For more information, see: Projection Classes

Creating a DistanceBased Object

A DistanceBased object is created by passing in a DataFrame and the label of the column that contains the coordinates:

distance_based = DistanceBased(data=df, coordinate_label="cartesian_coordinates")

Aggregating

Once the object has been created, the inputted data can be aggregated using the aggregate() function. As input, the aggregate() function takes a threshold distance and the desired aggregate type (e.g., mean, sum, etc). To convert the input distance threshold from kilometres to the projection system's unit, divide by the unit_conversion variable. Here is an example of aggregating the points by mean based on a distance threshold of 100 kilometres:

distance_based.aggregate(distance=100/unit_conversion, aggregate_type="mean")
Output: Aggregated 6 points into 2 clusters.
valuemidpointcountmbr
01.333333(-0.586686889152962, 0.5762543159788497, 0.5689721777837351)30.000468
1 4.666667(-0.6199685163109833, 0.5246975627357471, 0.5833791301682658)30.000712

In this example, the aggregation resulted in two clusters each containing three points. The 'value' column represents the mean 'value' for the points encapsulated within that cluster, the 'midpoint' column represents the midpoint (mean coordinate) for that cluster, 'count' represents the number of points encapsulated by the cluster, and 'mbr' represents the minimum bounding radius of the circle that encapsulates all contained points.

You will notice that the midpoint and mbr columns pertain to the projected coordinate system. The midpoint (in Cartesian form) can easily be converted back to latitude and longitude, and the minimum bounding radius can easily be converted to kilometres:

results = distance_based.aggregate(distance=100/unit_conversion, aggregate_type="mean")
results["midpoint"] = cartesian_projector.inverse_project(results["midpoint"])
results["mbr"] = results["mbr"] * unit_conversion
results
Output: Aggregated 6 points into 2 clusters.
valuemidpointcountmbr
01.333333(34.678585988237245, 135.51397815796616) 32.982328
14.666667(35.68857144588458, 139.7577815841674)34.534667