Out of all the things that can go wrong with your ML model, overfitting is one of the most common and most detrimental errors.
The bad news is that this time, it's not an exaggeration. Overfitting is a frequent issue and if your model generalizes data poorly on new testing data, you know have a problem.
What is overfitting?
It is a common pitfall in deep learning algorithms in which a model tries to fit the training data entirely and ends up memorizing the data patterns and the noise and random fluctuations. These models fail to generalize and perform well in the case of unseen data scenarios, defeating the model's purpose. When can overfitting occurs? The high variance of the model performance is an indicator of an overfitting problem. The training time of the model or its architectural complexity may cause the model to overfit. If the model trains for too long on the training data or is too complex, it learns the noise or irrelevant information within the dataset.
Overfitting vs. Underfitting
Underfitting occurs when we have a high bias in our data, i.e., we are oversimplifying the problem, and as a result, the model does not work correctly in the training data.
Overfitting occurs when the model has a high variance, i.e., the model performs well on the training data but does not perform accurately in the evaluation set. The model memorizes the data patterns in the training dataset but fails to generalize to unseen examples.
Overfitting happens when:
- The data used for training is not cleaned and contains garbage values. The model captures the noise in the training data and fails to generalize the model's learning.
- The model has a high variance.
- The training data size is not enough, and the model trains on the limited training data for several epochs.
- The architecture of the model has several neural layers stacked together. Deep neural networks are complex and require a significant amount of time to train, and often lead to overfitting the training set.
Underfitting happens when:
- Unclean training data containing noise or outliers can be a reason for the model not being able to derive patterns from the dataset.
- The model has a high bias due to the inability to capture the relationship between the input examples and the target values.
- The model is assumed to be too simple. For example, training a linear model in complex scenarios.
The goal is to find a good fit such that the model picks up the patterns from the training data and does not end up memorizing the finer details.
This, in turn, would ensure that the model generalizes and accurately predicts other data samples.