DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Big Data and Humility

Big Data and Humility

John Cook user avatar by
John Cook
·
Nov. 03, 11 · Big Data Zone · Interview
Like (0)
Save
Tweet
6.87K Views

Join the DZone community and get the full member experience.

Join For Free

One of the challenges with big data is to properly estimate your uncertainty. Often “big data” means a huge amount of data that isn’t exactly what you want.

As an example, suppose you have data on how a drug acts in monkeys and you want to infer how the drug acts in humans. There are two sources of uncertainty:

  1. How well do we really know the effects in monkeys?
  2. How well do these results translate to humans?

The former can be quantified, and so we focus on that, but the latter may be more important. There’s a strong temptation to believe that big data regarding one situation tells us more than it does about an analogous situation.

I’ve seen people reason as follows. We don’t really know how results translate from monkeys to humans (or from one chemical to a related chemical, from one market to an analogous market, etc.). We have a moderate amount of data on monkeys and we’ll decimate it and use that as if it were human data, say in order to come up with a prior distribution.

Down-weighting by a fixed ratio, such as 10 to 1, is misleading. If you had 10x as much data on monkeys, would you as much about effects in humans as if the original smaller data set were collected on people? What if you suddenly had “big data” involving every monkey on the planet. More data on monkeys drives down your uncertainty about monkeys, but does nothing to lower your uncertainty regarding how monkey results translate to humans.

At some point, more data about analogous cases reaches diminishing return and you can’t go further without data about what you really want to know. Collecting more and more data about how a drug works in adults won’t help you learn how it works in children. At some point, you need to treat children. Terabytes of analogous data may not be as valuable as kilobytes of highly relevant data.

Source:  http://www.johndcook.com/blog/2011/09/22/big-data-and-humility

Big data

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Testing Under the Hood Or Behind the Wheel
  • Why to Implement GitOps into Your Kubernetes CI/CD Pipelines
  • Everything I Needed to Know About Observability, I Learned from ‘Bewitched’
  • A First Look at CSS When and Else Statements

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo