DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • The Cypress Edge: Next-Level Testing Strategies for React Developers
  • Reliability Models and Metrics for Test Engineering
  • Architecture Patterns : Data-Driven Testing
  • The Power of Data-Driven Testing: A Deep Dive into Jakarta Persistence Specifications and NoSQL

Trending

  • It’s Not About Control — It’s About Collaboration Between Architecture and Security
  • Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis
  • Hybrid Cloud vs Multi-Cloud: Choosing the Right Strategy for AI Scalability and Security
  • Beyond Linguistics: Real-Time Domain Event Mapping with WebSocket and Spring Boot
  1. DZone
  2. Data Engineering
  3. Data
  4. How to Interpret a P-Value Histogram

How to Interpret a P-Value Histogram

Data scientists and analysts have plenty of experience interpreting p-values from statistical tests... but what happens when there are *millions* of p-values?

By 
David Robinson user avatar
David Robinson
·
Nov. 11, 17 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
12.1K Views

Join the DZone community and get the full member experience.

Join For Free

so you're a scientist or data analyst, and you have a little experience interpreting p-values from statistical tests. but then you come across a case where you have hundreds, thousands, or even millions of p-values. perhaps you ran a statistical test on each gene in an organism, or on demographics within each of hundreds of counties. you might have heard about the dangers of multiple hypothesis testing before. what's the first thing you do?

make a histogram of your p-values. do this before you perform multiple hypothesis test correction, false discovery rate control, or any other means of interpreting your many p-values. unfortunately, for some reason, this basic and simple task rarely gets recommended (for instance, the wikipedia page on the multiple comparisons problem never once mentions this approach). this graph lets you get an immediate sense of how your test behaved across all your hypotheses, and immediately diagnose some potential problems. here, i'll walk you through a basic example of interpreting a p-value histogram.

here are six approximate versions of what your histogram might look like. we'll explore what each one means in turn.

scenario a: anti-conservative p-values ("hooray!")

if your p-values look something like this:

...then it's your lucky day! you have (on the surface) a set of well-behaved p-values.

that flat distribution along the bottom is all your null p-values, which are uniformly distributed between 0 and 1. why are null p-values uniformly distributed? because that's part of the definition of a p-value: under the null, it has a 5% chance of being less than .05, a 10% chance of being less than .1, etc. this describes a uniform distribution .

that peak close to 0 is where your alternative hypotheses live — along with some potential false positives. if we split this up into nulls and alternatives, it might look like this:

notice that there are plenty of null hypotheses that appear at low p-values, so you can't just say "call all p-values less than .05 significant" without thinking, or you'll get lots of false discoveries. notice also that some alternative hypotheses end up with high p-values. those are the hypotheses you won't be able to detect with your test (false negatives). the job of any multiple hypothesis test correction is to figure out where best to place the cutoff for significance.

now, just how many of your hypotheses are alternative rather than null? you can get a sense of this from a histogram by looking at how tall the peak on the left is: the taller the peak, the more p-values are close to 0 and therefore significant. similarly, the "depth" of the histogram on the right side shows how many of your p-values are null.

note that if you want a more quantitative estimate of what fraction of your hypotheses are null, you can use the method of storey and tibshirani, 2003 . in r, you can use the qvalue package to do this.

scenario b: uniform p-values ("awww...")

alternatively, you might see a flat distribution (what statisticians call a "uniform" distribution):

this is what your p-values would look like if all your hypotheses were null. now, seeing this does not mean they actually are all null! it does mean that

  • at most a small percentage of hypotheses are non-null. an fdr correction method such as benjamini-hochberg will let you identify those.
  • applying an uncorrected rule like "accept everything with p-value less than .05" is certain to give you many false discoveries. don't do it!

scenario c: bimodal p-values ("hmmm...")

so you have a peak at 0, just like you saw in (a) but you also have a peak close to 1. what do you do?

don't apply false discovery rate control to these p-values yet. (why not? because some kinds of fdr control are based on the assumption that your p-values near 1 are uniform. if you break this assumption, you'll get way fewer significant hypotheses. everyone loses.)

instead, figure out why your p-values show this behavior and solve it appropriately:

  • are you applying a one-tailed test (for example, testing whether each gene increased its expression in response to a drug)? if so, those p-values close to 1 are cases that are significant in the opposite direction (cases where genes decreased their expression). if you want your test to find these cases, switch to a two-sided test. if you don't want to include them at all, you can try filtering out all cases where your estimate is in that direction.
  • do all the p-values close to 1 belong to some pathological case? an example from my own field: rna-seq data, which consists of read counts per gene in each a variety of conditions, will sometimes include genes for which there are no reads in any condition. some differential expression software will report a p-value of 1 for these genes. if you can find problematic cases like these, just filter them out beforehand (it's not like you're losing any information!).

scenario d: conservative p-values ("whoops...")

do not look at this distribution and say, "oh, i guess i don't have any significant hypotheses." if you had no significant hypotheses, your p-values would look something like (b) above. p-values are specifically designed so that they are uniform under the null hypothesis.

a graph like this indicates something is wrong with your test. perhaps your test assumes that the data fits some distribution that it doesn't fit. perhaps it's designed for continuous data while your data is discrete, or perhaps it is designed for normally-distributed data and your data is severely non-normal. in any case, this is a great time to find a friendly statistician to help you.

( update : rogier in the comments in the original post helpfully notes another possible explanation — your p-values may have already been corrected for multiple testing, for example using the bonferroni correction. if so, you might want to get your hands on the original, uncorrected p-values so you can view the histogram yourself and confirm it's well behaved!)

scenario e: sparse p-values ("hold on...")

sparse p-values are easy to recognize by those big gaps in the histogram. what this means is that while you may have (say) 10,000 hypotheses, they generated only a small number of distinct p-values. you can find out just how many distinct p-values your test generated with this line of r code:

length(unique(mypvalues)) 

why did you get p-values like this? did you:

  • run a bootstrap or permutation test with too few iterations? try increasing the number of iterations.
  • run a nonparametric test (for example, the wilcoxon rank-sum test or spearman correlation) on data with a small sample size? if you can, either get more data or switch to a parametric test.

don't run false discovery rate control, which typically makes the assumption that the p-value distribution is (roughly) continuous. if you absolutely need to use these p-values (and can't switch to a test that doesn't give you such sparse p-values), find a statistician!

scenario f: something even weirder ("what the...?!?")

big bump in the middle? bunch of random peaks? something that looks like nothing from this post?

stop whatever you're doing and find a statistician. there may be a simple explanation and/or fix, but you want to make sure you've found it before you work with these p-values anymore.

in closing, this post isn't a replacement for having a qualified statistician look over your data. but just by glancing at this simple visualization, you can tell a lot about how your test performed across your hypotheses — and you'll be a lot closer to knowing what to do with them.

Testing Data (computing)

Published at DZone with permission of David Robinson, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • The Cypress Edge: Next-Level Testing Strategies for React Developers
  • Reliability Models and Metrics for Test Engineering
  • Architecture Patterns : Data-Driven Testing
  • The Power of Data-Driven Testing: A Deep Dive into Jakarta Persistence Specifications and NoSQL

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!