Aggregation

KNearestNeighbours

The KNearestNeighbours class aggregates point data using the k-nearest neighbours algorithm. In simple terms, k-nearest neighbours works by connecting each point to its k nearest neighbours, where k is an arbitrary value provided by the user. Afterwards, the points are aggregated on the basis of whether they are part of the same disjoint set.

Importing the Class

The KNearestNeighbours class is located in GeoJikuu's aggregation.point_aggregators module:

from geojikuu.aggregation.point_aggregators import KNearestNeighbours

Coordinate Projection

GeoJikuu's aggregation classes assume that any input coordinates have already been projected to a linear coordinate system. For example:

from geojikuu.preprocessing.projection import CartesianProjector
cartesian_projector = CartesianProjector("wgs84")

data = {
    "lat": [34.6870676, 34.696109, 34.6525807, 35.7146509, 35.6653623, 35.6856905],
    "lon": [135.5237618, 135.5121774, 135.5059984, 139.7963897, 139.7254906, 139.7514867],
    "value": [1, 2, 1, 5, 6, 3]
}

df = pd.DataFrame.from_dict(data)

results = cartesian_projector.project(list(zip(df["lat"], df["lon"])))
df["cartesian_coordinates"] = results["cartesian_coordinates"]
unit_conversion = results["unit_conversion"]
df.head()

	lat	lon	value	cartesian_coordinates
0	34.687068	135.523762	1	(-0.5867252094096281, 0.5760951446437298, 0.5690939403658662)
1	34.696109	135.512177	2	(-0.5865446454704655, 0.5761508222375707, 0.5692236896905267)
2	34.652581	135.505998	1	(-0.5867908125787922, 0.5765169810552485, 0.5685989032948122)
3	35.714651	139.796390	5	(-0.6201191587857982, 0.5241083019965597, 0.5837488472665938)
4	35.665362	139.725491	6	(-0.6198530442409449, 0.5251996831772221, 0.5830501662256676)

For more information, see: Projection Classes

Creating a KNearestNeighbours Object

A KNearestNeighbours object is created by passing in a DataFrame and the label of the column that contains the coordinates:

knn = KNearestNeighbours(data=df, coordinate_label="cartesian_coordinates")

Aggregating

Once the object has been created, the inputted data can be aggregated using the aggregate() function. As input, the aggregate() function takes a value for k (the number of nearest neighbours) and the desired aggregate type (e.g., mean, sum, etc):

knn.aggregate(k=1, aggregate_type="mean")

Output: Aggregated 6 points into 2 clusters.

	value	midpoint	count	mbr

0	1.333333	(-0.586686889152962, 0.5762543159788497, 0.5689721777837351)	3	0.000468
1	4.666667	(-0.6199685163109833, 0.5246975627357471, 0.5833791301682658)	3	0.000712

In this example, the aggregation resulted in two clusters each containing three points. The 'value' column represents the mean 'value' for the points encapsulated within that cluster, the 'midpoint' column represents the midpoint (mean coordinate) for that cluster, 'count' represents the number of points encapsulated by the cluster, and 'mbr' represents the minimum bounding radius of the circle that encapsulates all contained points.

You will notice that the midpoint and mbr columns pertain to the projected coordinate system. The midpoint (in Cartesian form) can easily be converted back to latitude and longitude, and the minimum bounding radius can easily be converted to kilometres:

results = knn.aggregate(k=1, aggregate_type="mean")
results["midpoint"] = cartesian_projector.inverse_project(results["midpoint"])
results["mbr"] = results["mbr"] * unit_conversion
results

Output: Aggregated 6 points into 2 clusters.

	value	midpoint	count	mbr

0	1.333333	(34.678585988237245, 135.51397815796616)	3	2.982328
1	4.666667	(35.68857144588458, 139.7577815841674)	3	4.534667

Getting Started

Preprocessing

Aggregation

Descriptives

Hypothesis Testing

KNearestNeighbours

Importing the Class

Coordinate Projection

Creating a KNearestNeighbours Object

Aggregating

On this page