Machine Learning Fundamentals: The Bias and Variance Trade-off Clearly Explained
Understanding Bias and Variance is necessary to training a robust and effective model. Without a good knowledge of bias and variance, your model is bound to either overfit or underfit. A robust model strikes a perfect balance between bias and variance. Now, let's dive in to what bias/variance mean, their effect on models and how we can find this balance.
What is Bias?
This is an error resulting from the erroneous assumption of how the model fits or finds patterns within the data. Think of bias as the model's ability to capture the true relationship between features.
Low Bias
A well-trained model should have a low bias (to accurately capture the underlying patterns) and a reasonable variance to the model to generalize well to new data. However, getting a “low bias” is not the absolute target, it means the bias should be low enough to avoid underfitting, but not so low that it leads to high variance and overfitting.
Concretely, a “low bias” model is flexible and complex enough to capture the relationships in the data without making overly simplistic assumptions. It captures the intricate patterns, trends, and relationships in the data, resulting in low training error.
While low bias is often desirable, it can lead to overfitting preventing the model from generalizing well to unseen data.
High Bias
On the contrary, high bias means the model is too simplistic to capture the underlying patterns and relationships between the data. This can lead to the model underfitting.

What is Variance?
Variance measures the amount to which a model’s performance varies/changes when trained on a different subset of the training data. Think of variance as how the model performs when tested on unseen data(data it has not seen before).
Low Variance
Low variance indicates the model is not heavily influenced by the training data it was given. Models with low variance generalize well when trained on a different subset of the data(unseen data).
Simpler models such as logistic or linear regression are more likely to have low variance. Complex models such as neural networks and decision trees tend to have higher variance. While simpler models are good at providing low variance, they are also at high risk of underfitting as they may not capture the adequate underlying patterns in the data well, leading to high bias.
High Variance
A model having a high variance means it is very sensitive to the specific data it was trained on. Models with high variance fit the training data too tightly relative to unseen or test data. Hence such models perform well with training data but struggle with unseen data, resulting in poor generalization.
High variance models tend to lead to overfitting as they capture the patterns and noise in a particular data very closely while struggling with test data.

What is the Bias/Variance Trade-off?
This happens because both bias and variance are not mutually exclusive. They make up the total error in building a model in an interactive manner. Generally, as you decrease bias by making the model more complex, you increase variance. Conversely, as you decrease variance by simplifying the model, you increase bias.
- High Bias + Low Variance: The model is too simple, leading to underfitting.
- Low Bias + High Variance: The model is too complex, leading to overfitting.
- Low Bias + Low Variance: The ideal scenario where the model captures the underlying patterns without fitting the noise, resulting in good generalization.
The goal is to find an optimal balance between variance and bias, hence a low total error (expected loss). This does not entirely mean low variance and low bias but a low total error.
Total Error = Bias² + Variance + Irreducible Error
- Bias: Error due to the assumptions made by the model.
- Variance: Error due to the model’s sensitivity to the training data.
- Irreducible Error: Error due to noise in the data, which cannot be reduced by any model.
Approaches to handle Bias and Variance trade-off
- Cross Validation: This involves splitting the dataset into multiple subsets (folds) and performing training and validation multiple times. Each iteration is done with a different subset of the data for both training and validation. This helps in addressing variance as it shows the performance of the model on different data. If the model performs well on training data but poorly on cross-validation folds, it indicates overfitting (high variance). Conversely, poor performance on both training and validation data suggests underfitting (high bias).
- Regularization: Both L1(lasso) and L2(ridge) add a penalization term to the loss function by introducing a regularization term, λ.
- Cost/Loss function for linear regression , J(θ)= (1/2m)*∑ᵢ₌₁ᵐ(hθ(x⁽ⁱ⁾)−y⁽ⁱ⁾)² + penalty
- penalty for Lasso: λ∑ⱼ₌₁ⁿ∣θⱼ∣
- penalty for Rigde: λ∑ⱼ₌₁ⁿθⱼ²
Where:
- m is the number of training examples.
- hθ(x⁽ⁱ⁾) is the predicted value for the 𝑖i-th training example.
- y⁽ⁱ⁾ is the actual value for the 𝑖i-th training example.
- 𝜃ⱼ are the model parameters (coefficients).
- 𝜆 is the regularization parameter that controls the amount of regularization applied.
L1 encourages sparsity while L2 shrinks the coefficients(parameters), thus reducing variance.
Dropout is a regularization technique used in neural networks. It involves randomly selecting neurons which are “dropped out” or ignored during each training iteration. This means that these neurons are temporarily removed along with their connections, preventing them from participating in the forward and backward passes of the network.
3. Ensemble Methods: involve combining multiple models to improve the overall performance and robustness of predictions. The idea is that by aggregating the predictions of several models, the ensemble can often outperform any single model in the set, reducing both bias and variance. It can be done using Bagging(bootstrap aggregating), Boosting, or Stacking.
4. Feature Selection and Engineering
Feature Selection: involves removing irrelevant or redundant features that can introduce noise and increase variance.
Feature Engineering: involves creating new features that better capture the underlying patterns in the data. It can reduce bias by providing the model with more relevant information.
5. Model Selection: This involves knowing the right model or algorithm to choose for each task considering the structure of your data. Knowing the right algorithm to use for training a model can help prevent both underfitting and overfitting.
6. Increasing dataset: This is mostly helpful when your model suffers from underfitting. In using complex neural networks, it is essential to have adequate data as the model can underfit with inadequate data.
Conclusion
Understanding the role of the bias and variance trade-off is very essential in machine learning. Overfitting and underfitting are common challenges, and a solid grasp of the bias-variance trade-off provides the foundation to mitigate these issues. In training a model, the goal is to achieve a balance between bias and variance, and not necessarily low bias or low variance. By managing bias and variance, we can develop models that perform well on both training and unseen data, ensuring robust and reliable performance of our models.
Think of Bias as the model’s ability to fit the training data and Variance as the model’s performance on data it has not seen.
If you can’t explain it simply, you don’t understand it well enough.
~ Albert Einstein