Naïve Bayes' algorithm is a supervised learning algorithm, which is based on Bayes' theorem. It is used for solving classification problems i.e qualitative analysis. It is one of the simple and extremely fast relative to other classification algorithms. It utilizes the probabilistic approach to classify the data among the classes.
Naïve Bayes'
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
Types of Naïve Bayes' Model
Dataset
We will explore the IRIS dataset, which is the best known database to be found in the pattern recognition literature. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
Gaussian Naïve Bayes'
Since the dataset is continuous, we would be using Gaussian Naïve Bayes' to compute the probability. The Gaussian model assumes that attributes follow a normal distribution. The Gaussian Probability Density function is given by,
Let's say, we have 4 attributes(x1, x2, x3, x4) and we have 3 classes(0, 1, 2)
x = value of an attribute(say x1)
μ = mean of the attribute(x1) for a particular class(say 0)
σ = standard deviation of x1 for class 0
Similarly, we will compute the f(x) for other classes and perform the same operation for other attributes. f(x) is used to compute Likelihood probability i.e P(B|A).
Note:-
We would compute the mean and standard deviation for each attribute(feature) grouped by the classes. The value to be computed from training dataset.
Mean
Standard Deviation
The value of probability obtained are, 0.333(class 0), 0.3416(class 1), 0.325(class 2).
We would create a function to shuffle the dataframe, so as to verify our model. After shuffling, we would only consider test data for verification.
The average accuracy percentage is found to be 97%, which shows that the model gives a good prediction.