P-Values Without Proof
March 19, 2026
P-Values Without Proof
The Premise
For half a century, the p-value has been treated as a passport to publishability. Cross the sacred threshold of p < 0.05 and a finding is declared “significant.” Yet significance is not substance; it is merely the probability of observing data as extreme as ours, assuming the null hypothesis is true. That assumption is almost never true in biomedical contexts, rendering the p-value an elaborate exercise in conditional fantasy. The result is a ritual of false certainty — a statistic mistaken for a proof.
The Distortion
The overreliance on p-values distorts every layer of the research process.
- Design bias. Studies are powered not to detect meaningful effects but to cross the magic line. Sample sizes, endpoints, and analyses are chosen for statistical convenience rather than clinical sense.
- Researcher degrees of freedom. Multiple endpoints, subgroup fishing, and selective stopping times inflate the chance of “significance.” The p-value becomes a narrative device, not an inferential one.
- Binary thinking. The rich continuum of evidence collapses into a yes/no dichotomy. A result at p = 0.049 is lionized; one at p = 0.051 is dismissed — though they differ by less than rounding error.
- Suppression of uncertainty. Journals and funders privilege clear conclusions, not honest intervals. Confidence becomes marketing copy, not an estimate of variability.
In this way, the p-value culture converts scientific modesty into managerial performance.
The Consequence
This distortion leads to a literature dense with significant findings and thin on truth. Meta-analyses reveal effect sizes shrinking or vanishing as studies replicate. Clinical decisions made on such fragile foundations expose patients to ineffective or harmful treatments. Policymakers, seeing statistical “proof,” commit resources prematurely, while null or borderline results disappear into the file drawer.
Worse, the moral grammar of science is corrupted. The goal shifts from discovery to validation — to “getting the result.” Statistical literacy declines as statistical theater expands. The badge of significance replaces the burden of understanding.
The Way Forward
The repair of inference begins with humility.
- Abandon the ritual. Replace the binary threshold with estimation: confidence intervals, Bayesian posterior probabilities, likelihood ratios. Evidence is continuous.
- Report effect sizes and priors. Show how magnitude and plausibility, not arbitrary cutoffs, drive belief.
- Encourage pre-registration and transparency. Protect inference from the flexibility of hindsight.
- Educate reviewers and editors. Judgment should value mechanistic plausibility and reproducibility over cosmetic significance.
- Reward replication. Treat the second study that confirms an effect as the triumph, not the first that finds one.
In a science reclaimed from the tyranny of the p-value, proof is earned through coherence and convergence — not through decimals that flatter our uncertainty.
Selected References
- RegenMed (2025). Genuine Medical Research Has Lost Its Way.
- Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129–133.
- Goodman, S. (1999). Toward Evidence-Based Medical Statistics: The p-Value Fallacy. Annals of Internal Medicine, 130(12), 995–1004.
- Amrhein, V., Greenland, S., & McShane, B. (2019). Retire Statistical Significance. Nature, 567(7748), 305–307.
- Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8).
- Gelman, A., & Loken, E. (2014). The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem. Department of Statistics, Columbia University.
Get involved or learn more — contact us today!
If you are interested in contributing to this important initiative or learning more about how you can be involved, please contact us.
P-Values Without Proof
March 19, 2026
The Premise
For half a century, the p-value has been treated as a passport to publishability. Cross the sacred threshold of p < 0.05 and a finding is declared “significant.” Yet significance is not substance; it is merely the probability of observing data as extreme as ours, assuming the null hypothesis is true. That assumption is almost never true in biomedical contexts, rendering the p-value an elaborate exercise in conditional fantasy. The result is a ritual of false certainty — a statistic mistaken for a proof.
The Distortion
The overreliance on p-values distorts every layer of the research process.
- Design bias. Studies are powered not to detect meaningful effects but to cross the magic line. Sample sizes, endpoints, and analyses are chosen for statistical convenience rather than clinical sense.
- Researcher degrees of freedom. Multiple endpoints, subgroup fishing, and selective stopping times inflate the chance of “significance.” The p-value becomes a narrative device, not an inferential one.
- Binary thinking. The rich continuum of evidence collapses into a yes/no dichotomy. A result at p = 0.049 is lionized; one at p = 0.051 is dismissed — though they differ by less than rounding error.
- Suppression of uncertainty. Journals and funders privilege clear conclusions, not honest intervals. Confidence becomes marketing copy, not an estimate of variability.
In this way, the p-value culture converts scientific modesty into managerial performance.
The Consequence
This distortion leads to a literature dense with significant findings and thin on truth. Meta-analyses reveal effect sizes shrinking or vanishing as studies replicate. Clinical decisions made on such fragile foundations expose patients to ineffective or harmful treatments. Policymakers, seeing statistical “proof,” commit resources prematurely, while null or borderline results disappear into the file drawer.
Worse, the moral grammar of science is corrupted. The goal shifts from discovery to validation — to “getting the result.” Statistical literacy declines as statistical theater expands. The badge of significance replaces the burden of understanding.
The Way Forward
The repair of inference begins with humility.
- Abandon the ritual. Replace the binary threshold with estimation: confidence intervals, Bayesian posterior probabilities, likelihood ratios. Evidence is continuous.
- Report effect sizes and priors. Show how magnitude and plausibility, not arbitrary cutoffs, drive belief.
- Encourage pre-registration and transparency. Protect inference from the flexibility of hindsight.
- Educate reviewers and editors. Judgment should value mechanistic plausibility and reproducibility over cosmetic significance.
- Reward replication. Treat the second study that confirms an effect as the triumph, not the first that finds one.
In a science reclaimed from the tyranny of the p-value, proof is earned through coherence and convergence — not through decimals that flatter our uncertainty.
Selected References
- RegenMed (2025). Genuine Medical Research Has Lost Its Way.
- Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129–133.
- Goodman, S. (1999). Toward Evidence-Based Medical Statistics: The p-Value Fallacy. Annals of Internal Medicine, 130(12), 995–1004.
- Amrhein, V., Greenland, S., & McShane, B. (2019). Retire Statistical Significance. Nature, 567(7748), 305–307.
- Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8).
- Gelman, A., & Loken, E. (2014). The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem. Department of Statistics, Columbia University.
Get involved or learn more — contact us today!
If you are interested in contributing to this important initiative or learning more about how you can be involved, please contact us.