Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

More Data, Less Accuracy

DZone's Guide to

More Data, Less Accuracy

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

Statistical methods should do better with more data. That’s essentially what the technical term “consistency” means. But with improper numerical techniques, the the numerical error can increase with more data, overshadowing the decreasing statistical error.

There are three ways Bayesian posterior probability calculations can degrade with more data:

  1. Polynomial approximation
  2. Missing the spike
  3. Underflow

Elementary numerical integration algorithms, such as Gaussian quadrature, are based on polynomial approximations. The method aims to exactly integrate a polynomial that approximates the integrand. But likelihood functions are not approximately polynomial, and they become less like polynomials when they contain more data. They become more like a normal density, asymptotically flat in the tails, something no polynomial can do. With better integration techniques, the integration accuracy will improve with more data rather than degrade.

With more data, the posterior distribution becomes more concentrated. This means that a naive approach to integration might entirely miss the part of the integrand where nearly all the mass is concentrated. You need to make sure your integration method is putting its effort where the action is. Fortunately, it’s easy to estimate where the mode should be.

The third problem is that software calculating the likelihood function can underflow with even a moderate amount of data. The usual solution is to work with the logarithm of the likelihood function, but with numerical integration the solution isn’t quite that simple. You need to integrate the likelihood function itself, not its logarithm. I describe how to deal with this situation in Avoiding underflow in Bayesian computations.

***

If you’d like help with statistical computation, let’s talk.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:

Published at DZone with permission of John Cook, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}