Predictive analytics deals with the analysis of the historical and current data to predict the future or unknown. Some of the instances of predictive analytics are prediction of if a customer might might end their relationship with a company: Customer churn prediction, prediction of demand of a product using information like prior history of the product which includes information like number of products sold, regions of demand, seasonality, competitors’ new products in the market, etc. — Sales forecasting. Regression is one of the techniques used in predictive analytics.
Regression is a statistical technique and a supervised algorithm which models the relationship among variables. What are these variables? there are 2 types of variables:
- Independent variable: Usually called as ‘x’
- Dependent variable: Usually called as ‘y’ and is continuous in nature.
Regression models the relationship between the one/more dependent variables and one/more independent variables. It tries to fit a best fit line for the training data, the aim is to have low prediction error.
Some of the instances where we can use regression are, can I predict how happy (as real number) a kid is on his/her birthday? Based on the number of gifts, size of the gifts, likability of the gifts, etc. If I know this, then I can plan better for the next year birthday. Another example, can we predict how satisfied employees are? Based on the type of job, type of projects, work-life balance, etc.
Precisely, dependent variable (y) from the above examples are happiness, satisfaction and independent variables (x’s) are the number of gifts, size of the gifts, likability of the gifts, the type of job, type of projects, work-life balance.
Types of regression.
There are many types of regression models
- Linear Regression
- Polynomial Regression
- Ridge Regression
- Lasso Regression
- Elastic Net
Let’s start with Linear Regression
A linear regression models the relationship between a dependent variable (y) and an independent variables (x) by fitting a best fit regression line. Under the hood, it is a weighted sum of x along with a bias/ intercept.
Fig 1 shows data points, intercept, best fit line and the error between the actual and predicted values.
One of the error metrics for regression is Root Mean Square Errors(RMSE). RMSE is the square root of averaged errors, where error is the difference between the predicted and actual values.
Other error metrics for regression are Mean Absolute Error, Mean Squared Error,R² or Coefficient of Determination and Adjusted R².
But how to get the best fit line and reduce the RMSE? Using Normal equation, optimization algorithms like gradient descent (which will be discussed in upcoming blogs)
What happens when a non-linear relationship exists between X and Y? If linear regression is used then the prediction error will be high. In such cases, we can use polynomial regression which uses polynomial features along with the existing features shown in fig 3.
The degree of the above equation is 2.
With polynomial regression, overfitting can be a consequence of increasing the degree of the equation to fit the data perfectly.
A model is said to be overfitted when the model is tailored to the training data to a high degree and fails to generalize on the new unseen data.
One of the disadvantages, of polynomial regressions is it is sensitive to outliers.
Outliers in simple term means observations which are not similar to normal observations. Outliers can be considered as one-off events, error in recording observations, etc.
How to avoid overfitting? regularization is a technique to prevent overfitting.
Ridge Regression is a modified form of the linear regression, it is a regularized technique. In this type, the cost function is modified by adding square of the magnitude of the coefficients. What happens with that? The model tries to fit the data with the constraint of keeping the coefficients as small as possible and the cost function low. This reduces the complexity of the model.
α = 0, means no regularization (similar to linear regression)
α = Very large value, means all the coefficients will be close to zero.
From the above figure, we can see that we obtain different regression lines with different values of alpha.
Lasso Regression (least absolute shrinkage and selection operator)
Lasso regression is another regularized Linear Regression model. The cost function is modified by adding the magnitude of coefficients . There are 2 important points to be considered
- Avoids overfitting.
- Feature elimination: The weights of the least important features tend to be zero. Hence, the elimination of the features.
Elastic Net is a mix of both Ridge and Lasso regression. By controlling the value of r in the regularization term, we can decide the contribution of each type of regression.
r=0: Elastic Net is equal to Ridge regression
r =1: Elastic Net is equal to Lasso regression
Thank you for reading.
- Géron, A. (2019). Ridge Regression [Image]. In Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow (p. 138).
- Géron, A. (2019). Ridge Regression [Image]. In Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow (p. 140).