Linear Discriminant Analysis for Dimensionality Reduction

Linear Discriminant Analysis (LDA) is a very common technique for dimensionality reduction problems as a preprocessing step for machine learning and pattern classification applications. The goal of this tutorial is to project a dataset onto a lower-dimensional space with good class-separability.

In this tutorial, I have followed the ideas from a publication titled, "Linear discriminant analysis: A detailed tutorial". The aim of the paper is to build a solid intuition on how LDA works and to apply this technique in different applications.

We will explore the IRIS dataset, which is the best known database to be found in the pattern recognition literature. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Attribute information
  1. sepal length in cm
  2. sepal width in cm
  3. petal length in cm
  4. petal width in cm
  5. class(species name):
    • Iris Setosa
    • Iris Versicolor
    • Iris Virginica

The goal of the dimensionality reduction techniques is to reduce the dimensions by removing the redundant and dependent features by transforming the features from a higher dimensional space that may lead to a curse of dimensionality problem, to a space with lower dimensions.

Condition for dimensional reduction:

W = SB/SW

where,
SB = between class variance or distance between the means of different classes
SW = within class variance
W = ratio defining the transformation matrix

We need to maximize, W i.e maximize between class variance(SB) and minimize within class variance(SW)

LDA technique

We want to project the original data matrix onto a lower dimensional space. To achieve this goal, 3 steps needs to be performed,

  1. Calculate separability between different classes i.e between class variance(SB)
  2. Calculate distance between mean and samples of each class i.e within class variance(SW)
  3. Maximize W

Load Libraries and Prepare DataFrame

Chania
Chania
Chania

Here, the "IRIS" dataframe has been created

Visualize the Data

Here, we would be visualizing the density plots

Chania
Chania

The petal_length and petal_width are the attributes showing a good separation between the classes.

Mean values

Use groupby method to compute the class mean

Chania
Chania

Compute total mean(mean_total)

Chania

Count samples using groupby method

Chania

Compute SB(between class variance)

The between class variance for class 1(SB1) is given by,

SB1 = n * (M1 - M)T * (M1 - M)

where,
n = number of samples of class 1
M1 = mean of class 1
M = total mean(i.e mean_total)

and,

SB = SB1 + SB2 + SB3

Chania

Compute SW(within class variance)

The within class variance for class 1(SW1) is given by,

SW1 = (d1T) * d1

where,
d1 = difference between the samples and mean of class 1

and,

SW = SW1 + SW2 + SW3

Use get_group method to fetch the data of each class and store it in a matrix form

Chania

Compute Transformation Matrix(W)

The dimensionality reduction condition is met by maximizing the transformation matrix(W), which is given by,

W = SB/SW

or, in matrix form,

W = SW-1 * SB

Chania

LDA Subspace

Now, we need to find the plane along which the discrimination of classes is visible i.e maximize W. For this we would compute Eigen vector and Eigen value where, Eigen vector represents the direction of plane and Eigen value represents the corresponding magnitude of the plane. Since, we have to maximize W, the Eigen vector with higher Eigen value is the plane to be selected.

Chania

Here, each eigen value(L) represents the corresponding eigen vector(V) column

X Data on 2 LDA Axis

We want to reduce the dimension from 4D(4 attributes) to 2D. So, we would only have 2 LDA subspace.

Classification plot 1

As can be seen, the first two eigen values are quiet high compared to others, so, we would only use the first two eigen vector column. The first eigen vector column represents LDA1 subspace and second column LDA2 subspace for classification plot 1.

Classification plot 2

Later, we would have another classification plot 2 representing the highest and lowest eigen value (eigen vector columns).

Access the two columns of eigen vector

Chania

Transformed output, Z

Chania

create the dataframe

Chania

Classification Plot 1

Chania
Chania

Classification Plot 2

Chania
Chania
Chania

The two classification plots are quiet different, as the discrimination of classes on LDA2 axis for plot 2 is very difficult.

LDA via Scikit-learn

  1. Import the scikit library
  2. Create LDA object
  3. Fit the data on the object
Chania
Chania
Chania

From the above plot,

  • Discrimination of class on LDA1 axis can be clearly seen
  • On LDA2 axis discrimination is not possible