Cross Entropy Loss
Cross Entropy Loss
In a supervised learning problem for predicting classes, we predict probabilities for the classes. To determine how successful we are predicting the classes, we require a loss function
.
Cross-Entropy loss function provides a loss function to calculate the loss between the actual classes and the predicted probabilities
This can be represented as
- Get the predicted probability of the class
- Get the actual label of the class (
1 or 0
) - Multiply the
actual label
with thelog of the predicted probability
of the class - Sum the values obtained from each of the classes
- Multiply by -1 the value obtained in Step 4
This is the loss for a single observation. Therefore to calculate the total loss for a set of observations, we need to sum all the losses of the observations. Therefore to calculate the average loss for a set of observations, we need to average the losses of the observations.
Let’s take an example. We develop a fruit prediction model which predicts whether the fruit is orange, apple or guava.
The results are as follows
# | Actual Label | Predicted Probability | Log Predicted Probability |
---|---|---|---|
1 | Orange | 0.9 | log(0.9) |
2 | Apple | 0.6 | log(0.6) |
3 | Guava | 0.7 | log(0.7) |
4 | Apple | 0.4 | log(0.4) |
Total loss is therefore -1 * ( log(0.9) + log(0.6) + log(0.7) + log(0.4) )
Let’s go deeper for the 1st row. Here the fruit is orange
. Therefore the class label for the orange is 1 and the class labels for apple and guava are 0 and 0. Therefore the cross-entropy loss would be
-1 * [1 * Log ( Predicted Probability of Orange ) +
0 * Log ( Predicted Probability of Apple ) +
0 * Log ( Predicted Probability of Guava )]
= -1 *Log ( Predicted Probability of Orange )
- When the predicted probability is closer to 1, the loss is lower since log(1) = 0
- When the predicted probability is closer to 0, the loss is higher
Therefore, better predictions produces a lower loss.
Code
We checked the concepts by using Keras and using our concepts from 1st principles. Please also try it in your favorite environment.
Example 1
In this example, we have a single observation with 3 classes. We checked with Keras and from 1st principles, our results match.
import tensorflow as tf
import numpy as np
y_true = [[0, 1, 0]]
y_pred = [[0.05, 0.9, 0.05]]
cce = tf.keras.losses.CategoricalCrossentropy()
print(f'cce from keras ',cce(y_true, y_pred).numpy())
print(f'cce from first principles ',-1 * round(np.log(0.9),9))
Example 2
In this example, we have 3 observations with 3 classes. We see that we had to average the losses and our results match perfectly with the Keras function.
y_true = [[0, 1, 0],[1, 0, 0],[0, 0, 1]]
y_pred = [[0.05, 0.9, 0.05],[0.7, 0.2, 0.1],[0.2, 0.2, 0.6]]
cce = tf.keras.losses.CategoricalCrossentropy()
print(f'cce from keras ',cce(y_true, y_pred).numpy())
sum = -1 * (np.log(0.9) + np.log(0.7) + np.log(0.6))
print(f'cce from first principles ',sum / 3)