# Content

This notebook shows how we deal with the numeric data with different scale by using normalization techniques.

Currently, there are 3 normalization techniques which are used mostly in data preprocessing.

- L1 normalization –
**Least absolute deviations | Least absolute errors** - L2 normalization –
**Least squares** - MinMax normalization

It is clear that different normalization will be used for different purposes with different datasets. It is possible to use only one normalization technique for a particular dataset. Also, it is possible to use mixed normalization techniques (more than 2) for a different dataset.

Furthermore, there are two different ways of applying normalization technique(s) in a particular dataset (rows, columns or both). If we want to scale all values in **one feature or one column**, we will need to apply normalization **by column**. If we want to make **all features** on the same scale, we will need to apply normalization **by row**. In some cases, we are also able to apply normalization by row and column sequentially or vice versa.

**When do we need to normalize our data?**

- We want to seek for relationships between features.
- We want to use regression and multivariate for our further analysis. This is because these two types of analysis focus on exploring the relationships between features.
- We do
**not**consider much about the mean between or within features.

**When should we not use normalize our data?**

- In experimental research, we compare the mean of treatment with the mean of another treatment.
- We are dealing with categorical data (more than 2 categories per feature).

# Libraries

For reading data: **Pandas**

For scratch implementation: **Numpy**

For existing implementation: **Scikit learn**

```
import pandas as pd
import numpy as np
from sklearn.preprocessing import normalize, MinMaxScaler
```

# Data preparation

In this section, data is prepared for different normalization techniques.

**Data summary**

- Age – symmetric data (measured in the range of 1-100)
- Number of songs listened per day – symmetric data (measured in the range of 5-15)
- Satisfaction score – symmetric data (measured in the range of 0-1)

```
data = pd.DataFrame(
[[20, 10, 0.7], [30, 8, 0.4], [25, 11, 0.4], [10, 5, 0.8], [40, 7, 0.6]],
columns=['Age', 'Songs Listened', 'Satisfaction']
)
display(data)
```

# Normalization techniques

## L1 normalization

As mentioned above, L1 normalization is known as **Least absolute deviations**. It takes the sum of all elements in a particular row or column. Then, it divides the value of each element by that sum.

**Formula:**

**For example: (by row)**

**Questions:**

**When do we need to use L1 normalization?**- We want to have a robust normalization for the model. (Robust normalization: normalization with good performance for data drawn from a wide range of probability distributions.
- We do not consider much about outliers. (Outliers: Unusual data points in the distribution)
- The stability is not required. (Stability is the slight change in the fitting line when there is an adjustment in the data.
- We want to use sparse models which generate mostly zeros (vectors)

**What are the advantages of using L1 normalization?**- It is used for performing feature selection (we can delete all features where the coefficient is 0.
- It optimizes the median.

**What are the disadvantages of using L1 normalization?**- It does not perform well in the model which requires to have a small number of columns. This is because it creates a sparse matrix.
- It is not a good choice if datasets contain many outliers, especially when we want to detect outliers in the datasets.

```
def l1_from_scratch(data, axis=1):
''' This is the implementation of L1 normalization from scratch
Parameters
----------
data: Data needs to be normalized
axis: A way to normalize (0 = by columns, 1 = by rows)
Return
------
L1-normalization result
Notes
-----
1. We assume that the data contains only symmetric features (not categorical features)
2. We assume that the data is clean (no missing values)
'''
# Intialize
res = None
cols = data.columns
# Sum
total = np.sum(data, axis=axis)
# Division
if axis == 0:
# If we do normalization by columns
data = data.T
res = np.divide(data.values, total[:, None])
# Tranpose if we do normalization by rows
if axis == 0:
res = res.T
# Make DataFrame
res = pd.DataFrame(res, columns=cols)
return res
```

```
print("INPUT:")
display(data)
print("OUTPUT (from scratch):")
display(l1_from_scratch(data, axis=1))
print("OUTPUT (sklearn):")
display(
pd.DataFrame(
normalize(data, norm='l1'), columns=data.columns
)
)
```

## L2 Normalization

As mentioned above, L2 normalization is known as **Least squares**. It takes the sum of squares of all elements in a particular row or column. Then, it divides the value of each element by the root squared of that sum.

**Formula:**

**For example: (by row)**

**Questions:**

**When do we need to use L2 normalization?**- Robustness for the model is not required.
- We do consider outliers.
- The stability is required.
- We want to use non-sparse models which generate mostly non-zeros (vectors). Also, this support computational efficiency

**What are the advantages of using L2 normalization?**- It supports computational efficiency, so it is also easy to use gradient-based learning methods.
- It keeps the overall error small.
- It improves the prediction performance. This is because we consider almost all features.
- It is a better choice if we want to detect outliers in the datasets.

**What are the disadvantages of using L2 normalization?**- It is not used for performing feature selection.

```
def l2_from_scratch(data, axis=1):
''' This is the implementation of L2 normalization from scratch
Parameters
----------
data: Data needs to be normalized
axis: A way to normalize (0 = by columns, 1 = by rows)
Return
------
L2-normalization result
Notes
-----
1. We assume that the data contains only symmetric features (not categorical features)
2. We assume that the data is clean (no missing values)
'''
# Intialize
res = None
cols = data.columns
# Square
sq = np.square(data)
# Sum of square
total = np.sum(sq, axis=axis)
# Root squared
sq_rt = np.sqrt(total)
# Division
if axis == 0:
# If we do normalization by columns
data = data.T
res = np.divide(data.values, sq_rt[:, None])
# Tranpose if we do normalization by rows
if axis == 0:
res = res.T
# Make DataFrame
res = pd.DataFrame(res, columns=cols)
return res
```

```
print("INPUT:")
display(data)
print("OUTPUT (from scratch):")
display(l2_from_scratch(data, axis=1))
print("OUTPUT (sklearn):")
display(
pd.DataFrame(
normalize(data, norm='l2'), columns=data.columns
)
)
```

## Min-Max Normalization

Min-Max normalization is the way we map the entire range of values of a particular row or column to the range 0 to 1. The minimum value is mapped to 0, the maximum value is mapped to 1 and every other value is mapped to a decimal between 0 and 1.

**Formula:**

```
X_min = The minimum value of a particular row or column
X_max = The maximum value of a particular row or column
X_i = The value of a particular unit.
y = The value after normalizing.
```

**For example: (by column)**

**Questions:**

**When do we need to use Min-Max normalization?**- We want gradient descent to converge much faster. (Logistic Regression, SVMs, perceptrons, neural networks, etc.)
- When our further analysis uses K-NN for classification problems or K-means for clustering problems
- When we want to find directions of maximizing the variance. (LDA, PCA, Kernel-PCA)

**What are the advantages of using Min-Max normalization?**- The distribution of the feature is not normally distributed.
- The feature falls within a bound interval. (pixels intensities fit within 0-255 range)

**What are the disadvantages of using Min-Max normalization?**- It is not a good choice if the dataset contains outliers and we want to detect them. There is a solution to this disadvantage. For visualization, we need to transform your data into 2-dimensionals space. However, this solution might not an inefficient solution if we try to use the value of the y-axis.

```
def mm_from_scratch(data, axis=0):
''' This is the implementation of Min-Max normalization from scratch
Parameters
----------
data: Data needs to be normalized
axis: A way to normalize (0 = by columns, 1 = by rows)
Return
------
Min-Max normalization result
Notes
-----
1. We assume that the data contains only symmetric features (not categorical features)
2. We assume that the data is clean (no missing values)
'''
# Intialize
res = None
cols = data.columns
vals = None
# Transpose dataframe
if axis == 0:
vals = data.values
else:
vals = data.T.values
# Collect min and max of a particular row or column
bags = []
for ind in range(len(vals[0])):
min_val = min(vals[:, ind])
max_val = max(vals[:, ind])
vals[:, ind] = (vals[:, ind] - min_val)/(max_val - min_val)
# Transform values
if axis == 1:
vals = vals.T
# Make dataframe
res = pd.DataFrame(vals, columns=cols)
return res
```

```
print("INPUT:")
display(data)
print("OUTPUT (from scratch):")
display(mm_from_scratch(data, axis=0))
scaler = MinMaxScaler()
print("OUTPUT (sklearn):")
display(
pd.DataFrame(
scaler.fit_transform(data), columns=data.columns
)
)
```

Source code: Github (peterdu98)