Naïve Bayes' Implementation using Python

Naïve Bayes' algorithm is a supervised learning algorithm, which is based on Bayes' theorem. It is used for solving classification problems i.e qualitative analysis. It is one of the simple and extremely fast relative to other classification algorithms. It utilizes the probabilistic approach to classify the data among the classes.

Naïve Bayes'

  • Naïve:- It is called Naïve, because it is assumed that the occurrence of an attribute(feature) is independent of the occurrence of other attributes. This assumption is a disadvantage of this algorithm, yet it still predicts with good accuracy.
  • Bayes' Theorem:- In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event. The formula for Bayes' theorem is,

    P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
    P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true.
    P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
    P(B) is Marginal Probability: Probability of Evidence.

Types of Naïve Bayes' Model


We will explore the IRIS dataset, which is the best known database to be found in the pattern recognition literature. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Gaussian Naïve Bayes'

Since the dataset is continuous, we would be using Gaussian Naïve Bayes' to compute the probability. The Gaussian model assumes that attributes follow a normal distribution. The Gaussian Probability Density function is given by,


Let's say, we have 4 attributes(x1, x2, x3, x4) and we have 3 classes(0, 1, 2)
x = value of an attribute(say x1)
μ = mean of the attribute(x1) for a particular class(say 0)
σ = standard deviation of x1 for class 0

Similarly, we will compute the f(x) for other classes and perform the same operation for other attributes. f(x) is used to compute Likelihood probability i.e P(B|A).


  • In Bayes' Theorem, the marginal probability(P(B)) represents, the probability of the values of the attributes. Since, P(B) would remain the same across all classes, we would ignore computing it.
  • P(A) represents, probability of class within the data.

Load Libraries and Dataset


Train/Test Split


Compute Mean and Standard Deviation

We would compute the mean and standard deviation for each attribute(feature) grouped by the classes. The value to be computed from training dataset.



Standard Deviation


Class Probability


The value of probability obtained are, 0.333(class 0), 0.3416(class 1), 0.325(class 2).

Gaussian Probability Distribution Function



We would create a function to shuffle the dataframe, so as to verify our model. After shuffling, we would only consider test data for verification.


Prediction and Accuracy Test


The average accuracy percentage is found to be 97%, which shows that the model gives a good prediction.