ML | Linear Discriminant Analysis - GeeksforGeeks (2023)

  1. Linear Discriminant Analysis (LDA) is a supervised learning algorithm used for classification tasks in machine learning. It is a technique used to find a linear combination of features that best separates the classes in a dataset.
  2. LDA works by projecting the data onto a lower-dimensional space that maximizes the separation between the classes. It does this by finding a set of linear discriminants that maximize the ratio of between-class variance to within-class variance. In other words, it finds the directions in the feature space that best separate the different classes of data.
  3. LDA assumes that the data has a Gaussian distribution and that the covariance matrices of the different classes are equal. It also assumes that the data is linearly separable, meaning that a linear decision boundary can accurately classify the different classes.

LDA has several advantages, including:

It is a simple and computationally efficient algorithm.
It can work well even when the number of features is much larger than the number of training samples.
It can handle multicollinearity (correlation between features) in the data.

However, LDA also has some limitations, including:

It assumes that the data has a Gaussian distribution, which may not always be the case.
It assumes that the covariance matrices of the different classes are equal, which may not be true in some datasets.
It assumes that the data is linearly separable, which may not be the case for some datasets.
It may not perform well in high-dimensional feature spaces.

Linear Discriminant Analysis or Normal Discriminant Analysis or Discriminant Function Analysis is a dimensionality reduction technique that is commonly used for supervised classification problems. It is used for modelling differences in groups i.e. separating two or more classes. It is used to project the features in higher dimension space into a lower dimension space.
For example, we have two classes and we need to separate them efficiently. Classes can have multiple features. Using only a single feature to classify them may result in some overlapping as shown in the below figure. So, we will keep on increasing the number of features for proper classification.

ML | Linear Discriminant Analysis - GeeksforGeeks (1)

Example:
Suppose we have two sets of data points belonging to two different classes that we want to classify. As shown in the given 2D graph, when the data points are plotted on the 2D plane, there’s no straight line that can separate the two classes of the data points completely. Hence, in this case, LDA (Linear Discriminant Analysis) is used which reduces the 2D graph into a 1D graph in order to maximize the separability between the two classes.

ML | Linear Discriminant Analysis - GeeksforGeeks (2)

Here, Linear Discriminant Analysis uses both the axes (X and Y) to create a new axis and projects data onto a new axis in a way to maximize the separation of the two categories and hence, reducing the 2D graph into a 1D graph.

Two criteria are used by LDA to create a new axis:

  1. Maximize the distance between means of the two classes.
  2. Minimize the variation within each class.

ML | Linear Discriminant Analysis - GeeksforGeeks (3)

In the above graph, it can be seen that a new axis (in red) is generated and plotted in the 2D graph such that it maximizes the distance between the means of the two classes and minimizes the variation within each class. In simple terms, this newly generated axis increases the separation between the data points of the two classes. After generating this new axis using the above-mentioned criteria, all the data points of the classes are plotted on this new axis and are shown in the figure given below.

ML | Linear Discriminant Analysis - GeeksforGeeks (4)

But Linear Discriminant Analysis fails when the mean of the distributions are shared, as it becomes impossible for LDA to find a new axis that makes both the classes linearly separable. In such cases, we use non-linear discriminant analysis.

Mathematics

Let’s suppose we have two classes and a d- dimensional samples such as x1, x2 … xn, where:

  • n1 samples coming from the class (c1) and n2 coming from the class (c2).

If xi is the data point, then its projection on the line represented by unit vector v can be written as vTxi

Let’s consider u1 and u2 be the means of samples class c1 and c2 respectively before projection and u1hat denotes the mean of the samples of class after projection and it can be calculated by:

ML | Linear Discriminant Analysis - GeeksforGeeks (5)

Similarly,

ML | Linear Discriminant Analysis - GeeksforGeeks (6)

Now, In LDA we need to normalize |\widetilde{\mu_1} -\widetilde{\mu_2} |. Let y_i = v^{T}x_i be the projected samples, then scatter for the samples of c1 is:

(Video) Linear Discriminant Analysis (LDA) vs Principal Component Analysis (PCA)

ML | Linear Discriminant Analysis - GeeksforGeeks (7)

Similarly:

ML | Linear Discriminant Analysis - GeeksforGeeks (8)

Now, we need to project our data on the line having direction v which maximizes

ML | Linear Discriminant Analysis - GeeksforGeeks (9)

For maximizing the above equation we need to find a projection vector that maximizes the difference of means of reduces the scatters of both classes. Now, scatter matrix of s1 and s2 of classes c1 and c2 are:

ML | Linear Discriminant Analysis - GeeksforGeeks (10)

and s2

ML | Linear Discriminant Analysis - GeeksforGeeks (11)

After simplifying the above equation, we get:

Now, we define, scatter within the classes(sw) and scatter b/w the classes(sb):

ML | Linear Discriminant Analysis - GeeksforGeeks (12)

Now, we try to simplify the numerator part of J(v)

ML | Linear Discriminant Analysis - GeeksforGeeks (13)

Now, To maximize the above equation we need to calculate differentiation with respect to v

ML | Linear Discriminant Analysis - GeeksforGeeks (14)

Here, for the maximum value of J(v) we will use the value corresponding to the highest eigenvalue. This will provide us the best solution for LDA.

Extensions to LDA:

  1. Quadratic Discriminant Analysis (QDA): Each class uses its own estimate of variance (or covariance when there are multiple input variables).
  2. Flexible Discriminant Analysis (FDA): Where non-linear combinations of inputs are used such as splines.
  3. Regularized Discriminant Analysis (RDA): Introduces regularization into the estimate of the variance (actually covariance), moderating the influence of different variables on LDA.

Implementation

  • In this implementation, we will perform linear discriminant analysis using the Scikit-learn library on the Iris dataset.

Python3

# necessary import

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import sklearn

from sklearn.preprocessing import StandardScaler, LabelEncoder

from sklearn.model_selection import train_test_split

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

from sklearn.ensemble import RandomForestClassifier

(Video) Lecture 09 - Linear Discriminant Analysis (LDA) and Sampling Techniques in Machine Learning

from sklearn.metrics import accuracy_score, confusion_matrix

# read dataset from URL

cls = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

dataset = pd.read_csv(url, names=cls)

# divide the dataset into class and target variable

X = dataset.iloc[:, 0:4].values

y = dataset.iloc[:, 4].values

# Preprocess the dataset and divide into train and test

sc = StandardScaler()

X = sc.fit_transform(X)

le = LabelEncoder()

y = le.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# apply Linear Discriminant Analysis

lda = LinearDiscriminantAnalysis(n_components=2)

X_train = lda.fit_transform(X_train, y_train)

X_test = lda.transform(X_test)

# plot the scatterplot

(Video) Discriminant Functions

plt.scatter(

X_train[:,0],X_train[:,1],c=y_train,cmap='rainbow',

alpha=0.7,edgecolors='b'

)

# classify using random forest classifier

classifier = RandomForestClassifier(max_depth=2, random_state=0)

classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

# print the accuracy and confusion matrix

print('Accuracy : ' + str(accuracy_score(y_test, y_pred)))

conf_m = confusion_matrix(y_test, y_pred)

print(conf_m)

(Video) Dimensionality Reduction, PCA, Linear Discriminant Analysis (Learn ML vid 6)

ML | Linear Discriminant Analysis - GeeksforGeeks (15)

LDA 2 -variable plot

Accuracy : 0.9[[10 0 0] [ 0 9 3] [ 0 0 8]]

Applications:

  1. Face Recognition: In the field of Computer Vision, face recognition is a very popular application in which each face is represented by a very large number of pixel values. Linear discriminant analysis (LDA) is used here to reduce the number of features to a more manageable number before the process of classification. Each of the new dimensions generated is a linear combination of pixel values, which form a template. The linear combinations obtained using Fisher’s linear discriminant are called Fisher’s faces.
  2. Medical: In this field, Linear discriminant analysis (LDA) is used to classify the patient disease state as mild, moderate, or severe based upon the patient’s various parameters and the medical treatment he is going through. This helps the doctors to intensify or reduce the pace of their treatment.
  3. Customer Identification: Suppose we want to identify the type of customers who are most likely to buy a particular product in a shopping mall. By doing a simple question and answers survey, we can gather all the features of the customers. Here, a Linear discriminant analysis will help us to identify and select the features which can describe the characteristics of the group of customers that are most likely to buy that particular product in the shopping mall.

My Personal Notesarrow_drop_up

(Video) Predictive Model II - Prediction with Mathematical Functions

Videos

1. PCA LDA TSNe
(IT Bodhi)
2. Data Classification
(The City Of Knowledge)
3. Overview of Machine Learning | GeeksforGeeks
(GeeksforGeeks)
4. Best AI & Machine Learning Frameworks 2022 🚀 | BEST AI & ML tools to Learn
(Softlinks)
5. 005 Linear Classification
(Freecodecademy)
6. Sparse Sensor Placement Optimization for Classification (SSPOC)
(Bing Wen Brunton)

References

Top Articles
Latest Posts
Article information

Author: The Hon. Margery Christiansen

Last Updated: 09/24/2023

Views: 6280

Rating: 5 / 5 (70 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: The Hon. Margery Christiansen

Birthday: 2000-07-07

Address: 5050 Breitenberg Knoll, New Robert, MI 45409

Phone: +2556892639372

Job: Investor Mining Engineer

Hobby: Sketching, Cosplaying, Glassblowing, Genealogy, Crocheting, Archery, Skateboarding

Introduction: My name is The Hon. Margery Christiansen, I am a bright, adorable, precious, inexpensive, gorgeous, comfortable, happy person who loves writing and wants to share my knowledge and understanding with you.