Banner created using canva.com

You have built a classification model that classifies, for instance, whether a convict is a fraud or not, whether a lead will convert to a customer or not, whether a customer will churn or not, whether an email is spam or not and so on. How do you evaluate the performance of that model?

The confusion matrix is used precisely for the same. Actually, it is the first step in the evaluation process. It is one of the oldest techniques, and we are still using the same as it is simple and powerful. Not sure why it is called a confusion matrix though.

Let us understand what it is? How is it used in model selection? It is easy to explain with an example as mentioned below:

Fraud classification use case: Building a classification model to classify a customer fraud or not. The dataset contains 20 customers, and each one is labelled as fraud or not. In the second table, I have appended the predicted value to the raw data.

In the third table, I have added an outcome column to categorise the prediction based on the below logic.

I have created a 2 x 2 matrix to classify the predicted values in the below diagram. This matrix is called the confusion matrix.

Now, let's compute a few metrics used for model assessment as mentioned below:

Accuracy: a measure of how often the model predicts correctly.

Sensitivity / Recall: a measure of how often a model predicts Fraud correctly.

Specificity / True Negative Rate: how often NotFrauds are predicted correctly.

Precision: a measure of how often the model's prediction of fraud was a fraud.

False Positive Rate: how often NotFrauds are predicted incorrectly. This means your model has incorrectly classified a customer as a fraud even though he/she is actually not a fraud.

Interpretation: Though the model has an 'Accuracy' of 80%, which means 80% of values are classified correctly, the model can classify only 67% of frauds. Since the objective here is to classify the frauds, sensitivity is expected to be high, and hence it is not a good model.

Also, you can see the Precision is only 40%, which means out of 5 predicted as frauds, only 2 are frauds, and hence it is not a good model.

In the above example, two metrics do not favour choosing the model, so we cannot select this model.

I hope this gives a high-level understanding of the confusion matrix and its use. I have considered only binary classification problems for simplicity, but you can extend the same logic to higher degrees of classification problems.

Thanks for reading. If you find this article interesting, please like, share and comment.

Views are personal.

Image credit:

Foto von Andrea Piacquadio von Pexels