Linear regression is a modeling approach that attempts to establish a relationship between a dependent variable and one or more independent variables by fitting a linear equation. There are five assumptions of linear regression. This is what is going to be covered in this blog post.
All features should have a linear relationship with the target variable. It is important to check for outliers since linear regression is sensitive to outliers. This relationship can be checked with the use of a scatter plot.
All features should be normally distributed. It may be necessary to apply some type of transformation such as logarithmic transformation for the feature to have a normal distribution.
No or little multicollinearity
Multicollinearity happens when the independent features are too correlated with one another. Correlation can be checked by using df.corr(). Features can be positively or negatively correlated. If two features are two highly correlated, you may choose to remove one or take an average of both, create a new feature and drop them.
The residuals need to be independent of one another; the value of y(x) needs to be independent of the value of y(x+1).
The noise/random distribution of the residuals need to have the same variation across the linear regression.