STKNearestNeighbours
The STKNearestNeighbours class aggregates point data using spatio-temporal k-nearest neighbours clustering. It is an extension of KNearestNeighbours that selects neighbours that are closest in both space and time (rather than space only).
Importing the Class
The STKNearestNeighbours class is located in GeoJikuu's aggregation.point_aggregators module:
from geojikuu.aggregation.point_aggregators import STKNearestNeighbours
Coordinate Projection
GeoJikuu's aggregation classes assume that any input coordinates have already been projected to a linear coordinate system. For example:
from geojikuu.preprocessing.projection import CartesianProjector
cartesian_projector = CartesianProjector("wgs84")
data = {
"lat": [34.6870676, 34.696109, 34.6525807, 35.7146509, 35.6653623, 35.6856905],
"lon": [135.5237618, 135.5121774, 135.5059984, 139.7963897, 139.7254906, 139.7514867],
"date": ["19/03/1990", "19/03/1991", "19/03/1992", "19/03/1993", "19/03/1994", "19/03/1995"],
"value": [1, 2, 1, 5, 6, 3]
}
df = pd.DataFrame.from_dict(data)
results = cartesian_projector.project(list(zip(df["lat"], df["lon"])))
df["cartesian_coordinates"] = results["cartesian_coordinates"]
unit_conversion = results["unit_conversion"]
df.head()
lat | lon | date | value | cartesian_coordinates | |
---|---|---|---|---|---|
0 | 34.687068 | 135.523762 | 19/03/1990 | 1 | (-0.5867252094096281, 0.5760951446437298, 0.5690939403658662) |
1 | 34.696109 | 135.512177 | 19/03/1991 | 2 | (-0.5865446454704655, 0.5761508222375707, 0.5692236896905267) |
2 | 34.652581 | 135.505998 | 19/03/1992 | 1 | (-0.5867908125787922, 0.5765169810552485, 0.5685989032948122) |
3 | 35.714651 | 139.796390 | 19/03/1993 | 5 | (-0.6201191587857982, 0.5241083019965597, 0.5837488472665938) |
4 | 35.665362 | 139.725491 | 19/03/1994 | 6 | (-0.6198530442409449, 0.5251996831772221, 0.5830501662256676) |
Convert Dates to Timesteps
In the case of STKNearestNeighbours, it is also necessary to convert any dates to timesteps before running the aggregate() function:
from geojikuu.preprocessing.conversion_tools import DateConvertor
date_convertor = DateConvertor(date_format_in="%d/%m/%Y", date_format_out="%d/%m/%Y")
date_convertor.date_to_days(date="10/05/1995")
df['date_converted'] = df['date'].apply(date_convertor.date_to_days)
df.head()
lat | lon | date | value | cartesian_coordinates | date_converted | |
---|---|---|---|---|---|---|
0 | 34.687068 | 135.523762 | 19/03/1990 | 1 | (-0.5867252094096281, 0.5760951446437298, 0.5690939403658662) | 726544 |
1 | 34.696109 | 135.512177 | 19/03/1991 | 2 | (-0.5865446454704655, 0.5761508222375707, 0.5692236896905267) | 726909 |
2 | 34.652581 | 135.505998 | 19/03/1992 | 1 | (-0.5867908125787922, 0.5765169810552485, 0.5685989032948122) | 727275 |
3 | 35.714651 | 139.796390 | 19/03/1993 | 5 | (-0.6201191587857982, 0.5241083019965597, 0.5837488472665938) | 727640 |
4 | 35.665362 | 139.725491 | 19/03/1994 | 6 | (-0.6198530442409449, 0.5251996831772221, 0.5830501662256676) | 728005 |
Creating a STKNearestNeighbours Object
An STKNearestNeighbours object is created by passing in a DataFrame, the label of the column that contains the coordinates, and the label of the column that contains the timesteps:
st_knn = STKNearestNeighbours(data=df, coordinate_label="cartesian_coordinates", time_label="date_converted")
Aggregating
Once the object has been created, the inputted data can be aggregated using the aggregate() function. As input, the aggregate() function takes a value for k (the number of nearest neighbours) and the desired aggregate type (e.g., mean, sum, etc):
st_knn.aggregate(k=1, aggregate_type="mean")
Output: Aggregated 6 points into 2 clusters.
value | date_converted | midpoint | count | mbr | temporal_extent | |
---|---|---|---|---|---|---|
0 | 1.333333 | 726909.333333 | (-0.586686889152962, 0.5762543159788497, 0.5689721777837351, 726909.3333333334) | 3 | 0.000468 | (726544, 727275) |
1 | 4.666667 | 728005.000000 | (-0.6199685163109833, 0.5246975627357471, 0.5833791301682658, 728005.0) | 3 | 0.000712 | (727640, 728370) |
In this example, the aggregation resulted in three clusters. The 'value' column represents the mean 'value' for the points encapsulated within that cluster, the 'date_converted' column represents the temporal midpoint of the cluster, the 'midpoint' column represents the spatial midpoint (mean coordinate) for that cluster, 'count' represents the number of points encapsulated by the cluster, and 'mbr' represents the minimum bounding radius of the circle that encapsulates all contained points.
You will notice that the 'midpoint' and 'mbr' columns pertain to the projected coordinate system. The midpoint (in Cartesian form) can easily be converted back to latitude and longitude, and the minimum bounding radius can easily be converted to kilometres:
results = st_knn.aggregate(k=1, aggregate_type="mean")
results["midpoint"] = cartesian_projector.inverse_project(results["midpoint"])
results["mbr"] = results["mbr"] * unit_conversion
results
Output: Aggregated 6 points into 2 clusters.
value | date_converted | midpoint | count | mbr | temporal_extent | |
---|---|---|---|---|---|---|
0 | 1.333333 | 726909.333333 | (34.678585988237245, 135.51397815796616) | 3 | 2.982328 | (726544, 727275) |
1 | 4.666667 | 728005.000000 | (35.68857144588458, 139.7577815841674) | 3 | 4.534667 | (727640, 728370) |
You will also notice that the 'date_converted' column, which contains the temporal midpoint of each cluster, is still represented by timesteps. It can be converted back to date via:
results['date_converted'] = results['date_converted'].apply(date_convertor.days_to_date)
results
value | date_converted | midpoint | count | mbr | temporal_extent | |
---|---|---|---|---|---|---|
0 | 1.333333 | 19/03/1991 | (34.678585988237245, 135.51397815796616) | 3 | 2.982817 | (726544, 727275) |
1 | 4.666667 | 19/03/1994 | (35.68857144588458, 139.7577815841674) | 3 | 4.535411 | (727640, 728370) |