Linear Discriminant Analysis (LDA) is a very common technique for dimensionality reduction problems as a preprocessing step for machine learning and pattern classification applications. The goal of this tutorial is to project a dataset onto a lower-dimensional space with good class-separability.
In this tutorial, I have followed the ideas from a publication titled, "Linear discriminant analysis: A detailed tutorial". The aim of the paper is to build a solid intuition on how LDA works and to apply this technique in different applications.
We will explore the IRIS dataset, which is the best known database to be found in the pattern recognition literature. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
The goal of the dimensionality reduction techniques is to reduce the dimensions by removing the redundant and dependent features by transforming the features from a higher dimensional space that may lead to a curse of dimensionality problem, to a space with lower dimensions.
W = SB/SW
SB = between class variance or distance between the means of different classes
SW = within class variance
W = ratio defining the transformation matrix
We need to maximize, W i.e maximize between class variance(SB) and minimize within class variance(SW)
We want to project the original data matrix onto a lower dimensional space. To achieve this goal, 3 steps needs to be performed,
Here, the "IRIS" dataframe has been created
Here, we would be visualizing the density plots
The petal_length and petal_width are the attributes showing a good separation between the classes.
Use groupby method to compute the class mean
Compute total mean(mean_total)
Count samples using groupby method
The between class variance for class 1(SB1) is given by,
SB1 = n * (M1 - M)T * (M1 - M)
n = number of samples of class 1
M1 = mean of class 1
M = total mean(i.e mean_total)
SB = SB1 + SB2 + SB3
The within class variance for class 1(SW1) is given by,
SW1 = (d1T) * d1
d1 = difference between the samples and mean of class 1
SW = SW1 + SW2 + SW3
Use get_group method to fetch the data of each class and store it in a matrix form
The dimensionality reduction condition is met by maximizing the transformation matrix(W), which is given by,
W = SB/SW
or, in matrix form,
W = SW-1 * SB
Now, we need to find the plane along which the discrimination of classes is visible i.e maximize W. For this we would compute Eigen vector and Eigen value where, Eigen vector represents the direction of plane and Eigen value represents the corresponding magnitude of the plane. Since, we have to maximize W, the Eigen vector with higher Eigen value is the plane to be selected.
Here, each eigen value(L) represents the corresponding eigen vector(V) column
We want to reduce the dimension from 4D(4 attributes) to 2D. So, we would only have 2 LDA subspace.
Classification plot 1
As can be seen, the first two eigen values are quiet high compared to others, so, we would only use the first two eigen vector column. The first eigen vector column represents LDA1 subspace and second column LDA2 subspace for classification plot 1.
Classification plot 2
Later, we would have another classification plot 2 representing the highest and lowest eigen value (eigen vector columns).
Access the two columns of eigen vector
Transformed output, Z
create the dataframe
The two classification plots are quiet different, as the discrimination of classes on LDA2 axis for plot 2 is very difficult.
From the above plot,