MinMaxScaler
Scaling is often essential when performing an analysis or building a model that uses multiple numerical variables of different magnitudes. This is because numerical variables with larger scales have more influence over the analysis and hence lead to biased output. The MinMaxScaler class solves this problem by allowing users to scale a set of numerical variables to be within some lower and upper bound while preserving the relative differences of the values within each variable. By default, MinMaxScaler scales values to be between 0 and 1.
Importing the Class
MinMaxScaler is located in GeoJikuu's preprocessing.normalisation module:
from geojikuu.preprocessing.normalisation import MinMaxScaler
Creating a MinMaxScaler Object
Creating a DateConvertor object requires passing in a list of values or a Pandas DataFrame column.
# Example 1: Using a list
values = [1, 2, 3, 4, 5]
min_max_scaler = MinMaxScaler(values)
# Example 2: Using a Pandas DataFrame
import pandas as pd
data = {"values": [0, 1, 2, 3, 4, 5]}
df = pd.DataFrame.from_dict(data)
min_max_scaler = MinMaxScaler(df["values"])
By default, MinMaxScaler will scale values between a lower bound or 0 and an upper bound of 1. This can be changed by setting the 'interval' variables when creating the object. For example:
values = [1, 2, 3, 4, 5]
min_max_scaler = MinMaxScaler(values, interval=[10, 20])
Scaling Values
Once the MinMaxScaler object has been created, the scale() function can be used to scale a value:
min_max_scaler.scale(2.5)
Output: 0.5
Conveniently, the scale() function can also be used to scale multiple values at once. This can be done by passing in a list:
min_max_scaler.scale([1.5, 2, 3.5])
Output: [0.3, 0.4, 0.7]
Or a DataFrame column:
data = {
"values": [1.5, 2, 3.5],
}
df = pd.DataFrame.from_dict(data)
df["scaled"] = min_max_scaler.scale(df["values"])
df.head()
values | scaled | |
---|---|---|
0 | 1.5 | 0.3 |
1 | 2.0 | 0.4 |
2 | 3.5 | 0.7 |
Inverse Scaling
To convert values back to the initial scale, the inverse_scale() function is used:
min_max_scaler.inverse_scale(0.5)
Output: 2.5
As with the scale() function, inverse_scale() also handles lists and DataFrame columns.