Understanding Regularization in Machine Learning

Prerna Ranjan
7 min readFeb 22, 2023

--

In machine learning, there is a concept of regularization. Simply put, regularization is the process of adding information to reduce uncertainty. In the context of machine learning, this typically means adding constraints to a model to prevent overfitting. Overfitting is a problem that can occur when a model is too complex and tries to fit too much data. This can lead to poor performance on new data, as the model has not generalizable well. Regularization can help combat overfitting by simplifying the model and making it more robust. In this blog post, we will explore regularization in machine learning in more depth. We will discuss why regularization is important, different methods of regularization, and how to choose the right method for your data.

What is Regularization?

Linear and logistic regression models form the basis of deep-learning neural networks. They are interpretable, fast, and simple to implement. However, when training the regression models, the models likely chase outliers of training data, which can result in models not generalizing well to the new data. This aspect is called overfitting. Overfitting is a process in which a model attempts to cover all the data points, capturing noises in the process. To create models that generalize better, one needs to regularize the models. Many forms of regularization exist but this article discusses two of the most popular methods, Lasso (L1) and Ridge (L2).

In machine learning, regularization is a technique used to prevent overfitting. Overfitting occurs when a model is too closely fitted to the training data and does not generalize well to new data. Regularization introduces a penalty that discourages the model from fitting too closely to the training data.

There are two main types of regularization: L1 and L2. L1 regularization encourages sparsity, or a lack of coefficients, in the model. L2 regularization encourages small coefficient values. Both types of regularization can be used to prevent overfitting.

In addition to preventing overfitting, regularization can also improve the interpretability of a model. A model with fewer coefficients is easier to interpret than a model with many coefficients. Therefore, using regularization can trade off some accuracy for interpretability.

Regularization is an important technique in machine learning, and understanding how it works is essential for building accurate models that generalize well to new data.

The following illustrations show how training loss keeps decreasing in a generalization curve.

A high-variance machine learning model captures all the details of the training data along with the existing noise in the data. Minimizing the training loss is needed, however, the validation loss starts to increase after a specific number of iterations. In simpler terms, while we are trying to reduce the training loss and validation loss becomes noticeable in the process, thus increasing the model’s complexity, so it cannot generalize new data points.

The Different Types of Regularization

When a model suffers from overfitting, the model’s complexity is controlled by shrinking the coefficient estimates to zero. Regularization avoids overfitting by adding a penalty to the model’s loss function.

The commonly used regularization techniques are:

  1. L1 Regularization
  2. L2 Regularization
  3. Dropout Regularization

L1 regularization is when the coefficients of the features are shrunken towards zero by penalizing them with an absolute value term. This term is added to the loss function during training. The features with the smallest values will be dropped first. This type of regularization is useful when we want to create sparse models, models with few features.

L2 regularization is when the coefficients of the features are shrunken towards zero by penalizing them with a squared value term. This term is added to the loss function during training. The features with the smallest values will be dropped first. This type of regularization is useful when we want to avoid overfitting by reducing the complexity of the model.

Elastic net regularization is when the coefficients of the features are shrunken towards zero by penalizing them with both an absolute value term and a squared value term. This term is added to the loss function during training. The features with the smallest values will be dropped first. This type of regularization is useful when we want to create sparse models and avoid overfitting by reducing the complexity of the model at the same time.

We will focus on L1 and L2 regularization in this article.

L2 Regularization

A linear regression that uses the L2 regularization technique is called ridge regression. In this process of ridge regression, an absolute value of magnitude coefficient is added as a penalty term to the loss function (L). The L2 regularization intends to keep the model’s weight close to zero, but not exactly zero which means each feature has a low impact on output making the model’s accuracy as best as possible.

Where λ controls the strength of regularization, and wj is the model’s weights (coefficients). By increasing λ, the model becomes flattered and underfit. On the other hand, by decreasing λ, the model becomes more overfit, and with λ = 0, the regularization term will be eliminated.

L1 Regularization

Lasso (Least Absolute Shrinkage and Selection Operator) is also a method to regularize overfitted models. Lasso regression adds a penalty term to the cost function of linear regression in a way that makes some coefficients zero, this means the model will ignore those features. By ignoring the least important feature the model becomes more accurate. Lasso regression selects features by discarding the least important feature.

Where λ controls the strength of regularization, and wj is the model’s weights (coefficients).

Elastic Net

The Elastic Net regularization is a combination of ridge and lasso regularization. Even though combining both L1 and L2 regression techniques works better, it is very difficult to balance the ratio of the two parameters, and r in the following equation

How to Implement Regularization in Machine Learning

A demonstration in python of L2 and L1 regularization techniques to the Boston Housing dataset and compare the training set score and test set score before and after implementing these techniques.

Here we are loading the dataset to split it into training and test set:

Now, let’s train the linear regression model and print the training score and test score:

The output is as followed:

The model suffers from overfitting while comparing the performance metrics of the training set and test set.

To counter this overfitting and less accuracy of the model, let’s use the L2 (Ridge) Regression Technique

The ridge regression training set score is slightly less than the linear regression but the test set score significantly improved which means the overfitting decreased as well as the model’s complexity.

The alpha parameter represents the relationship between the model’s performance on the training set and its simplicity. Increasing the alpha value (its default value is 1.0) simplifies the model by shrinking the coefficients.

Now, let’s apply lasso regression to this dataset:

When the alpha parameter is 1.0, the lasso regression performs very poorly due to underfitting because it makes almost all the coefficients zero.

Let’s decrease the alpha parameter to 0.01 and see how it works:

In the first method of lasso regression where the alpha parameter is 1.0 and 0.01 respectively, the number of features used by the model can be sought by this code:

Can you spot the significant difference between the number of features in the first and second methods of lasso regression? The latter model is accurate and less complex.

Pros and Cons of Regularization

When it comes to regularization in machine learning, there are a few key things to consider. On the one hand, regularization can help prevent overfitting by adding constraints to the model and encouraging simpler models. On the other hand, regularization can also lead to underfitting by making the model too constrained. In addition, regularization can be computationally expensive and can slow down training time.

Overall, there are a few key pros and cons to consider when it comes to regularization in machine learning. When used correctly, regularization can be a powerful tool for preventing overfitting. However, it is important to understand the potential downsides of regularization before using it in your models.

Conclusion

Regularization is an important concept in machine learning that can help you avoid overfitting your data. By adding a regularization term to your objective function, you can encourage your model to prefer simpler solutions that are less likely to overfit. There are many different types of regularization methods available, so it’s important to experiment and find the one that works best for your data and task. With a little understanding of regularization, you can build better machine-learning models that generalize well to new data.

--

--

Prerna Ranjan

Founder @ Fortbell. Business Analyst, Financial and Risk Advisor, Cryptocurrency Enthusiast. Love all things data!