Tuesday, December 6, 2022

Easy understanding of Confusion matrix with examples

Confusion matrix is an important metric to evaluate the performance of classifier. The performance of a classifier depends on their capability predict the class correctly against new or unseen data. It is one of the easiest metrics for finding the correctness and accuracy of the model. The confusion matrix in itself is not a performance measure but all the performance metrics are based on confusion matrix. The ideal situation for any classification model would be when FP=0 and FN=0 but that’s not the case in real life. Depending upon the situation, we might want to minimize either FP or FN. 


Fig 3.10 Confusion Matrix


Let us consider both the situations with the help of example.

Case 1: Minimizing FN

Let,

1 = person having cancer

0 = person not having cancer

In this case, having some False Positive (FP) cases might be okay because classifying a non-cancerous person as cancerous does not affect a lot because on further test we will anyway find out that the particular person does not have cancer. But having False Negative (FN) cases can be hazardous because classifying a cancerous person as non-cancerous can cause serious threat to the life of that person. So, in this case we need to minimize FN.


Case 2: Minimizing FP

Let,

1 = Email is a spam

0 = Email is not spam

In this situation, having False Positive (FP) cases i.e., wrongly classifying non-spam or important email as spam can cause serious damage to the business, financial loss to the individuals and so on. Thus, in this situation we need to minimize FP.

Various metrics based on confusion matrix

  1. Accuracy: It is the number of correct predictions made by the model over all kinds of predictions made. It is a good measure when the target variable classes in the data are nearly balanced.


Accuracy = (TP+TN)/(TP+FP+TN+FN)


  1. Precision: It is defined as the ratio of number of positive samples correctly classified as positive to the total number of samples classified as positive (either correctly or incorrectly). It reflects how reliable the model is in classifying the samples as positive. It is used if the problem is sensitive to classifying a sample as positive in general.


Precision = TP/(TP+FP)


  1. Recall or sensitivity: It is defined as the ratio of number of positive samples correctly classified as positive to the total number of actual positive samples. It measures the model’s ability to detect positive samples. Higher the recall value, more positive samples are detected. It is independent of how the negative samples are classified. It is used if the goal is to detect all the positive samples without caring whether the negative samples would be misclassified as positive.


Recall = TP/(TP+FN)


Related Posts:

  • Topic Modeling: Working of LSA (Latent Semantic Analysis) in simple terms LSA (Latent Semantic Analysis) is another technique used for topic modeling. The main concept behind topic modeling is that the meaning behind any document is based on some latent variables so we use various topic model… Read More
  • Easy understanding of Confusion matrix with examplesConfusion matrix is an important metric to evaluate the performance of classifier. The performance of a classifier depends on their capability predict the class correctly against new or unseen data. It is one of the easiest m… Read More
  • Bi-LSTM in simple words In traditional neural networks, inputs and outputs were independent of each other. To predict next word in a sentence, it is difficult for such model to give correct output, as previous words are required to be remember… Read More
  • Challenges of Sentiment analysis Lack of availability of enough domain-specific datasetMulti-class classification of the dataDifficult to extract the context of the post made by the usersHandling of neutral postsAnalyzing the posts of multiple language… Read More
  • Topic Modeling Topic modeling is an unsupervised technique as it is used to perform analysis on text data having no label attached to it. As the name suggests, it is used to discover a number of topics within the given sets of text li… Read More

0 comments:

Post a Comment