Over a million developers have joined DZone.
Platinum Partner

Robustness of equal weights

· Performance Zone

The Performance Zone is brought to you in partnership with New Relic. Quickly learn how to use Docker and containers in general to create packaged images for easy management, testing, and deployment of software.

In Thinking, Fast and Slow, Daniel Kahneman comments on The robust beauty of improper linear models in decision making by Robyn Dawes. According to Dawes, or at least Kahneman’s summary of Dawes, simply averaging a few relevant predictors may work as well or better than a proper regression model.

One can do just as well by selecting a set of scores that have some validity for predicting the outcome and adjusting the values to make them comparable (by using standard scores or ranks). A formula that combines these predictors with equal weights is likely to be just as accurate in predicting new cases as the multiple-regression model that was optimal in the original sample. More recent research went further: formulas that assign equal weights to all the predictors are often superior, because they are not affected by accidents of sampling.

If the data really do come from an approximately linear system, and you’ve identified the correct variables, then linear regression is optimal in some sense. If a simple-minded approach works nearly as well, one of these assumptions is wrong.

  1. Maybe the system isn’t approximately linear. In that case it would not be surprising that the best fit of an inappropriate model doesn’t work better than a crude fit.
  2. Maybe the linear regression model is missing important predictors or has some extraneous predictors that are adding noise.
  3. Maybe the system is linear, you’ve identified the right variables, but the application of your model is robust to errors in the coefficients.

Regarding the first point, it can be hard to detect nonlinearities when you have several regression variables. It is especially hard to find nonlinearities when you assume that they must not exist.

Regarding the last point, depending on the purpose you put your model to, an accurate fit might not be that important. If the regression model is being used as a classifier, for example, maybe you could do about as good a job at classification with a crude fit.

The context of Dawes’ paper, and Kahneman’s commentary on it, is a discussion of clinical judgement versus simple formulas. Neither author is discouraging regression but rather saying that a simple formula can easily outperform clinical judgment in some circumstances.

The Performance Zone is brought to you in partnership with New Relic. Read more about providing a framework that gets you started on the right path to move your IT services to cloud computing, and give you an understanding as to why certain applications should not move to the cloud.


Published at DZone with permission of John Cook , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}