Consider the situation where we have information concerning dietary habits of a group of patients given as vectors $X_1, …, X_n$ and their levels of blood pressure given as $y_1, …, y_n$. We want to know what effect the $X_i$’s have on the $y_i$’s. In other words, we consider $X_i$ as a predictor for the outcome $y_i$.
We give an overview of regression as a process for describing the relation between predictors and the outcome. We define what overfitting means in this context and how regularization is used to prevent it. Then we introduce Lasso (least absolute shrinkage and selection operator) which fits a linear model using a square loss function and a regularization term. We will discuss the intuition behind the idea.