Stepwise Regression
What Is Stepwise Regression?
Stepwise regression is the step-by-step iterative construction of a regression model that involves the selection of independent variables to be used in a last model. It involves adding or removing potential explanatory variables in succession and testing for statistical significance after each iteration.
The availability of statistical software packages makes stepwise regression possible, even in models with hundreds of variables.
Types of Stepwise Regression
The underlying goal of stepwise regression is, through a series of tests (e.g. F-tests, t-tests) to find a set of independent variables that essentially influence the dependent variable. This is done with computers through iteration, which is the process of showing up at results or decisions by going through repeated adjusts or cycles of analysis. Directing tests automatically with help from statistical software packages enjoys the benefit of saving time and restricting mistakes.
Stepwise regression can be achieved either by evaluating one independent variable at a time and remembering it for the regression model assuming that it is statistically significant or by remembering all potential independent variables for the model and it are not statistically important to eliminate those that. Some use a combination of the two methods and therefore there are three approaches to stepwise regression:
- Forward selection begins without any variables in the model, tests each variable as it is added to the model, then keeps those that are deemed generally statistically critical — repeating the process until the results are optimal.
- Backward elimination begins with a set of independent variables, deleting one all at once, then testing to see in the event that the removed variable is statistically critical.
- Bidirectional elimination is a combination of the initial two methods that test which variables ought to be included or excluded.
Example
An example of a stepwise regression utilizing the backward elimination method would be an attempt to understand energy usage at a factory utilizing variables, for example, equipment run time, equipment age, staff size, temperatures outside, and time of year. The model includes the variables — then each is all removed, one all at once, to determine which is least statistically huge. Eventually, the model could show that time of year and temperatures are generally huge, possibly suggesting the peak energy consumption at the factory is when air conditioner usage is at its highest.
Limitations of Stepwise Regression
Regression analysis, both linear and multivariate, is widely used in the economics and investment world today. The idea is often to find patterns that existed in the past that could likewise recur from here on out. A simple linear regression, for example, could take a gander at the price-to-earnings ratios and stock returns over numerous years to determine if stocks with low P/E ratios (independent variable) offer higher returns (dependent variable). The problem with this approach is that market conditions often change and relationships that have held in the past don't necessarily hold true in the present or future.
Meanwhile, the stepwise regression process has numerous pundits and there are even calls to stop utilizing the method altogether. Analysts note several downsides to the approach, including incorrect results, an inherent bias in the process itself, and the necessity for critical computing power to develop complex regression models through iteration.
Features
- Stepwise regression has its downsides, however, as an approach fits data into a model to achieve the desired result.
- The backward elimination method begins with a full model loaded with several variables and afterward removes one variable to test its importance relative to overall results.
- Stepwise regression is a method that iteratively examines the statistical significance of each independent variable in a linear regression model.
- The forward selection approach begins with nothing and adds each new variable incrementally, testing for statistical significance.