Evaluation Glossary Class 10

Evaluation Overview
What is Evaluation
Evaluation Terminologies
Scenario
Confusion Matrix
Confusion Matrix table
Parameters for evaluating
Accuracy
Accuracy Formula
Precision
Precision Formula
Recall
Recall Formula
F1 Score
F1 Score Table
BoardCBSE
TextbookCode 417
Class10
Chapter8
Chapter NameEvaluation
SubjectArtificial Intelligence 417

Evaluation Overview

So till now, we have learned about the Project Cycle and its different components. Now we will be studying the final component of it, which is EVALUATION.

What is Evaluation?

Evaluation is a process that critically examines a program. It involves collecting and analyzing information about a program’s activities, characteristics, and outcomes. Its purpose is to make judgments about a program, to improve its effectiveness, and/or to inform programming decisions.

Let me explain this to you,

So, Evaluation is basically to check the performance of your AI model. This is done by mainly two things “Prediction” & “Reality“. Evaluation is done by:-

  • First search for some testing data with the resulted outcome that is 100% true
  • Then you will feed that testing data to the AI modal while you have the correct outcome with yourself that is termed as “Reality”.
  • Then when you will get the predicted outcome from the AI modal that is called “Prediction” compare it with the resulted outcome, that is “Reality”.
  • You can do this to:-
    • improve the efficiency and performance of your AI Modal, 
    • Improve it, check your mistakes.
Prediction and Reality

Try not to use the dataset that has been used in the process of data Acquisition or the Training data in Evaluation.

  • This is because your model will simply remember the whole training set, and will therefore always predict the correct label for any point in the training set. This is known as overfitting.

Evaluation Terminologies

There are various terminologies that come in when we work on evaluating our model. Let’s explore them with an example of the Football scenario

The Scenario

Imagine that you have come up with an AI-based prediction model which has been deployed to identify Football or a soccer ball.

Now, the objective of the model is to predict whether the given/shown figure is a football. Now, to understand the efficiency of this model, we need to check if the predictions which it makes are correct or not. Thus, there exist two conditions that we need to consider upon Prediction and Reality.

  • The prediction is the output that is given by the machine
  • The reality is the real scenario about the figur shown when the prediction has been made.

Now let us look at various combinations that we can have with these two conditions.

1Possibility

Is this a Football?
  1. Prediction = YES
  2. Reality = YES
  3. True Positive

Here, we can see in the picture that it’s a football. The model’s predicts is Yes
which means it’s a football. The Prediction matches Reality. Hence, this condition is
termed as True Positive.

Image
Football

2-Case

Is this a Football?
  1. Prediction = NO
  2. Reality = NO
  3. True Negtive

Here this is Not an image of Football hence the reality is No. In this case, the machine has predicted
it correctly as a No. Therefore, this condition is termed as True Negative.

Image
Cricket Ball

3-Possible action

Is this a Football?
  1. Prediction = YES
  2. Reality = NO
  3. False Positive

Here the reality is that there is no Football. But the machine has incorrectly predicted that there is
a Football. This case is termed False Positive.

Image
Volleyball

4-last case

Is this a Football?
  1. Prediction = NO
  2. Reality = YES
  3. False Negative

Here, a Football has been in a different look because of which the Reality is Yes but the machine has incorrectly predicted it as a No which means the machine predicts that there is no Football. Therefore, this case becomes False Negative.

Image

Confusion Matrix

The comparison between the results of Prediction and reality is called the Confusion Matrix.

Evaluation Metrics

The confusion matrix allows us to understand the prediction results. It is not an evaluation metric but a record that can help in evaluation. The four conditions of the football that we just read, let’s just go through them again.

Confusion Matrix Class 10 AI

Confusion Matrix table

Confusion Matrix table

Prediction and Reality can be easily mapped together with the help of this confusion matrix.


Parameters to Evaluate a Model

Now let us go through all the possible combinations of “Prediction” and “Reality” & let us see how we can use these conditions to evaluate the model.

Methods of Evaluation

Accuracy

Definition: The percentage of “correct predictions out of all the observations“. A prediction can be said to be correct “if it matches the reality”.

Here, we have two conditions in which the Prediction matches with the Reality:

True Positive

  1. Prediction = YES
  2. Reality = YES

When the model prediction is Yes & when it matches with reality which is also YES. Hence, this condition is termed as True Positive.

True Negative

  1. Prediction = NO
  2. Reality = NO

When the model prediction is NO & when it matches with reality which is also NO. Hence, this condition is termed as True Negative.

Accuracy Formula

Accuracy Word formula

Accuracy word formula

Accuracy Formula

Accuracy Formula

Here, total observations cover all the possible cases of prediction that can be True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).

Example

Let us go back to the Football example.

Assume that the model always predicts that there is no football. But in reality, there is a 2% chance of a football. In this case, for 98 cases, the model will be right but for those 2 cases in which there was a football, then to the model predicted for no football.
Here,

  1. True Positives = 0
  2. True Negatives = 98
  3. Total cases = 100
  4. Therefore, accuracy becomes:
    98+0/100 = 98%
football

Conclusion

  1. Predicition = Always NO
  2. Reality = 2% Probability of YES
  3. 98% ACCURATE

This is fairly high accuracy for an AI model. But this parameter is useless for us as the actual cases where it was a Football was not taken into account.

Hence, there is a need to look at another parameter that takes account of such cases as well.

Precision

Definition: The percentage of “true positive cases versus all the cases where the prediction is true. That is, it takes into account the True Positives and False Positives.

That is taken into the count of:

True Positives

  1. Prediction = YES
  2. Reality = YES

When the model prediction is Yes & when it matches with reality which is also YES. Hence, this condition is termed as True Positive.

False Positives

  1. Prediction = YES
  2. Reality = NO

When the model prediction is Yes & when it matches with reality which is NO. Hence, this condition is termed as False Positive.

Precision Formula

Precision Word Formula

precision word formula

Precision Formula

precision formula

Going back to the football example, in this case, assume that the model always predicts that there is a Football irrespective of the reality. In this case, all the Positive conditions would be taken into account that is,

  • True Positive (Prediction = Yes and Reality = Yes)
  • False Positive (Prediction = Yes and Reality = No)

In this case, the Players will check for the ball all the time to see if it is Football or not (which means if the reality is True or False ).

You might recall the story of the boy who falsely cries out that there are wolves every time and so when they actually arrive, no one comes to his rescue. Similarly, here if the Precision is low (which means there are more False predictions than the actual ones) then the Players would get complacent and might not go and check every time considering it could be a false prediction.

If Precision is high, this means the True Positive cases are more, giving lesser False predictions.

Example

  1. Predicition = 10 cases of TP
  2. Reality = 20 cases of YES
  3. 100% ACCURATE

Let us consider that a model has 100% precision. This means that whenever the machine says there’s a Football, there is actually a Football(True Positive).

In the same model, there can be a rare exceptional case where there was actual Football but the system could not detect it. This is the case of a False Negative condition.

But the precision value would not be affected by it because it does not take FN (False Negative) into account.

Recall

Definition: The fraction of positive cases that are correctly identified

It majorly takes into account the true reality cases wherein Reality there was a football but the machine either detected it correctly or it didn’t. That is, it considers

  • True Positives (There was a football in reality and the model predicted it correctly)
  • False Negatives (There was a football and the model didn’t predict it).

True Positives

  1. Prediction = YES
  2. Reality = YES

When the model prediction is Yes & when it matches with reality which is also YES. Hence, this condition is termed as True Positive.

False Negative

  1. Prediction = NO
  2. Reality = YES

When the model prediction is No & when it doesn’t match with the reality which is YES, then this condition is termed as False Negative.

Recall Formula

Recall Word Formula

recall word formula

Recall Formula

recall formula

Now as we notice, we can see that the Numerator in both Precision and Recall is the same: True Positives. But in the denominator, Precision counts the False Positives while Recall takes False Negatives into consideration.

F1 Score

Definition: The measure of the balance between precision and recall.

So before going deep inside the F1 score we must first understand its definition. It is said that “the balance between precision and recall” as we don’t know which metric is more important we seak the term F1 score.

F1 Score Formula

Which Metric is Important?

Let’s see different cases before coming to the conclusion which metric is more important “Precision” or “Recall”

  1. Choosing between Precision and Recall depends on the condition in which the model has been deployed. In a case like Forest Fire, a False Negative can cost us a lot and is risky too. Imagine no alert being given even when there is a Forest Fire. The whole forest might burn down.
forest image
bacteria

2. Another case where a False Negative can be dangerous is Viral Outbreak. Imagine a deadly virus has started spreading and the model which is supposed to predict a viral outbreak does not detect it. The virus might spread widely and infect a lot of people.

3. On the other hand, there can be cases in which the False Positive condition costs us more than False Negatives. One such case is Mining. Imagine a model telling you that there exists treasure at a point and you keep on digging there but it turns out that it is a false alarm. Here, the False Positive case (predicting there is a treasure but there is no treasure) can be very costly.

mining evaluation class 10
Spam in evaluation class 10

4. Similarly, let’s consider a model that predicts whether a mail is spam or not. If the model always predicts that the mail is spam, people would not look at it and eventually might lose important information. Here also False Positive condition (Predicting the mail as spam while the mail is not spam) would have a high cost.

Cases of High FN Cost

  1. Forest Fire
  2. Viral

Cases of High FP Cost

  1. Spam
  2. Mining

Both the parameters are important

High Precision

high precision

High Recall

high recall

To conclude the argument, we must say that if we want to know if our model’s performance is good, we need these two measures: Recall and Precision.

For some cases, you might have a High Precision but Low Recall or Low Precision but High Recall.

But since both the measures are important, there is a need for a parameter that takes both Precision and Recall into account which is called the F1 Score.

An ideal situation would be when we have a value of 1 (that is 100%) for both Precision and Recall. In that case, the F1 score would also be an ideal 1 (100%). It is known as the perfect value for the F1 Score. As the values of both Precision and Recall ranges from 0 to 1, the F1 score also ranges from 0 to 1.

F1 Score Table

Let us explore the variations we can have in the F1 Score:

F1 Score Table

In conclusion, we can say that a model has good performance if the F1 Score for that model is high.