PointDistribution
The PointDistribution class is used for quantifying the distribution of points in a dataset. When analysing the distribution of data, we are interested in the shape, centre, and spread of the subject variable. In the context of points distributed over space, this involves calculating statistics concerning the displacement of points in relation to each other or to some other point, such as their geographical midpoint.
Importing the Class
The PointDistribution class is located in GeoJikuu's descriptives.spatial_distribution module:
from geojikuu.descriptives.spatial_distribution import PointDistribution
Creating a PointDistribution Object
A PointDistribution object is created by passing in a list of (lat, lon) coordinates:
data = {
"lat": [34.6870676, 34.696109, 34.6525807, 35.7146509, 35.6653623, 35.6856905],
"lon": [135.5237618, 135.5121774, 135.5059984, 139.7963897, 139.7254906, 139.7514867],
}
df = pd.DataFrame.from_dict(data)
pd_analysis = PointDistribution(list(zip(df["lat"], df["lon"])))
Computing the Geographical Midpoint
Given a set of points on the surface of the Earth, their geographical midpoint is defined as the point which reflects their most central location. In the case of two points, for example, their midpoint is the point on their connecting line such that the distance on either side is equal. Computing the geographical midpoint of the PointDistribution object can be done by calling the geo_midpoint() function:
pd_analysis.geo_midpoint()
Output: (35.20208795638931, 137.6226886288883)
Computing Pairwise Displacement Statistics
Displacement statistics are used to quantify the spatial distribution of a set of points. Such statistics can be computed by first calculating the displacement between each pair of points and then calculating the following statistics related to those measurements: mean, standard deviation, variance, and quartiles. For example:
mean_displacement = pd_analysis.mean_displacement()
displacement_std = pd_analysis.displacement_std()
displacement_variance = pd_analysis.displacement_variance()
displacement_quartiles = pd_analysis.displacement_quartiles()
print("Displacement Statistics")
print("-----------------------")
print(f"{'Mean'}: {mean_displacement}")
print(f"{'Standard Deviation'}: {displacement_std}")
print(f"{'Variance'}: {displacement_variance}")
print("Quartiles:")
print(f" MIN: {displacement_quartiles['MIN']}")
print(f" Q1: {displacement_quartiles['Q1']}")
print(f" MEDIAN: {displacement_quartiles['MEDIAN']}")
print(f" Q3: {displacement_quartiles['Q3']}")
print(f" MAX: {displacement_quartiles['MAX']}")
print(f" IQR: {displacement_quartiles['IQR']}")
print(f" RANGE: {displacement_quartiles['RANGE']}")
Output:
Displacement Statistics
-----------------------
Mean: 242.82341922845717
Standard Deviation: 201.3914252565384
Variance: 40558.5061668599
Quartiles:
MIN: 1.4603107237518165
Q1: 5.0254511373009425
MEDIAN: 397.76196647412286
Q3: 401.6004309540522
MAX: 407.36364342227023
IQR: 396.57497981675124
RANGE: 405.9033326985184
From the results above, we can make the following observations (among others):
- The mean displacement between any two points is ~243 kilometres, with a standard deviation of ~201 kilometres.
- No two points are less than ~1.5 kilometres apart, and no two points are more than ~407 kilometres apart.
- 25% of the point pairs are within ~5 kilometres of each other, 50% are within ~398 kilometres of each other, and 75% are within ~402 kilometres of each other.
- The median is noticeably larger than the mean, indicating that the point distribution is skewed to the left. This means points tend to be further apart than they tend to be closer.
Computing Reference Point Displacement Statistics
Rather than calculating statistics concerning the displacement of each point in relation to every other point, we can also use a reference point instead. In this case, displacement statistics are calculated concerning each point in relation to the reference point.
This can be achieved by passing the desired reference point into the same PointDistribution functions used in the previous step.
Any arbitrary reference point can be given, but the example below uses the geo midpoint:
mean_displacement = pd_analysis.mean_displacement()
displacement_std = pd_analysis.displacement_std()
displacement_variance = pd_analysis.displacement_variance()
displacement_quartiles = pd_analysis.displacement_quartiles()
print("Displacement Statistics")
print("-----------------------")
print(f"{'Mean'}: {mean_displacement}")
print(f"{'Standard Deviation'}: {displacement_std}")
print(f"{'Variance'}: {displacement_variance}")
print("Quartiles:")
print(f" MIN: {displacement_quartiles['MIN']}")
print(f" Q1: {displacement_quartiles['Q1']}")
print(f" MEDIAN: {displacement_quartiles['MEDIAN']}")
print(f" Q3: {displacement_quartiles['Q3']}")
print(f" MAX: {displacement_quartiles['MAX']}")
print(f" IQR: {displacement_quartiles['IQR']}")
print(f" RANGE: {displacement_quartiles['RANGE']}")
Output:
Displacement Statistics
-----------------------
Mean: 200.83619768738552
Standard Deviation: 2.587762542918773
Variance: 6.696514978533435
Quartiles:
MIN: 197.35107354395333
Q1: 199.81969582141406
MEDIAN: 200.3040150988218
Q3: 201.90911063541782
MAX: 204.95568383948827
IQR: 2.08941481400376
RANGE: 7.60461029553494
From the results above, we can make the following observations (among others):
- The mean displacement between any point and the midpoint is ~201 kilometres, with a standard deviation of ~2.6 kilometres.
- No point is closer to the midpoint than ~197 kilometres, and no point is further away than ~205 kilometres.
- The quartiles, mean, and median are all approximately 200 kilometres in value. This indicates that the points are consistently centred at approximately 200 kilometres from the midpoint.