In Thinking, Fast and Slow, Daniel Kahneman comments on The robust beauty of improper linear models in decision making by Robyn Dawes. According to Dawes, or at least Kahneman’s summary of Dawes, simply averaging a few relevant predictors may work as well or better than a proper regression model.
One can do just as well by selecting a set of scores that have some validity for predicting the outcome and adjusting the values to make them comparable (by using standard scores or ranks). A formula that combines these predictors with equal weights is likely to be just as accurate in predicting new cases as the multiple-regression model that was optimal in the original sample. More recent research went further: formulas that assign equal weights to all the predictors are often superior, because they are not affected by accidents of sampling.
If the data really do come from an approximately linear system, and you’ve identified the correct variables, then linear regression is optimal in some sense. If a simple-minded approach works nearly as well, one of these assumptions is wrong.
- Maybe the system isn’t approximately linear. In that case it would not be surprising that the best fit of an inappropriate model doesn’t work better than a crude fit.
- Maybe the linear regression model is missing important predictors or has some extraneous predictors that are adding noise.
- Maybe the system is linear, you’ve identified the right variables, but the application of your model is robust to errors in the coefficients.
Regarding the first point, it can be hard to detect nonlinearities when you have several regression variables. It is especially hard to find nonlinearities when you assume that they must not exist.
Regarding the last point, depending on the purpose you put your model to, an accurate fit might not be that important. If the regression model is being used as a classifier, for example, maybe you could do about as good a job at classification with a crude fit.
The context of Dawes’ paper, and Kahneman’s commentary on it, is a discussion of clinical judgement versus simple formulas. Neither author is discouraging regression but rather saying that a simple formula can easily outperform clinical judgment in some circumstances.