Science reform: Do we need a plan B?
At the Society for the Improvement of Psychological Science tenth annual conference
There are two points I would like to make:
1. There is no Plan B in science reform.
2. We should have a Plan B, and it should be to restore the adversarial nature of science as our top priority.
These are separate points. Even if you disagree about what Plan B should be, please consider whether or not there is a contingency plan at all.
What is Plan A?
Plan A is to do metascience research and write papers. It’s a good plan. I believed this plan would work to improve the practice of science ten years ago. I think the early reformers believed this would work and underestimated how much advocacy would be needed. Adoption of open science principles should have been automatic.
Adoption is automatic if researchers are truly interested in discovery. Anyone who is interested in discovery and seeking the truth should care that their sample size is too small, they're p-hacking, or they're pursuing a question with an infinitesimal base rate.
What we found out is that there are many interests in science outside of discovery: promotion, tenure, influence et cetera. From a purely scientific, disinterested standpoint, we have to accept the results of this ten to twenty year experiment of doing Plan A. We have a problem that is much worse than what can be solved by metascientific discovery. Discovery cures ignorance. The problem goes beyond ignorance.
The question of whether or not curing ignorance works was tested recently (Wiradhany et al., 2025). Understanding of questionable research practices was only weakly associated with not participating in them, and 30% of respondents “endorsed p-hacking.”
There is no Plan B
I can't prove a negative, but I will claim that there has been no contingency plan in scientific reform. Trying to convince scientists, schools, funders, and journals to change is the only plan.
To take a list of interventions from another recent paper (Dudda et al., 2025), the standard interventions are: preregistration and registered reports, badges, checklists, journal guidelines and policies, data sharing, training, tools, statistical reform, and collaborative research.
These are interventions that require convincing scientists, schools, funders, and journals to change their behavior. They are all asking a set of people with certain incentives to change despite the incentives they face. If the incentives are too strong and too universal, then too bad.
In the language of publish or perish, Plan A is asking people to perish. We should have known that this was the case and that maybe they would decline to perish. We should have known that there could be something bad about open science adopters perishing from their fields.
Ten years ago, Eric-Jan Wagenmakers was quoted in a Nature article on adversarial collaboration, the kind of thing that would have worked in my opinion. It's “about as attractive as putting one’s head on a guillotine” (Nuzzo, 2015). We should have listened. We all know this is true, how little anyone wants to change. We should know how dangerous it is to “kill off” the best scientists.
Plan A should continue regardless. Plan A is the shoulders of giants. But we must interpret the results dispassionately.
We are not dispassionate on this topic. Progress is often interpreted optimistically, for instance, with “cumulative percentage” of adopters (Ferguson, 2023) or selective reporting of hopeful signs (Korbmacher et al., 2023). A recent survey found only “57% of people believe that most scientists are honest.” The authors interpreted this as “no widespread lack of trust in scientists” (Cologna, 2025).
Plan B: Outside Adversaries
Plan B should be to restore the adversarial nature of science as our top priority. By “top priority” I mean that sacrifices will need to be made. In my opinion, the sacrifice that needs to be made is expertise. If everyone who has expertise also has incentives that compromise research, then we need to find adversaries who don't have these incentives, even if they are not recognized as experts.
This is a big compromise. One of the ways we “pay” for research is with respect for the status of researchers. We enforce rigor and excellence in research by supporting a hierarchy of scientists. We assume anyone who has earned their way to the top must be very special. Anyone who has earned their way to the middle is too.
It's scary to take away the “status game” of science. The game works in some way. It built the modern world. Planes aren't falling out of the sky, so why stop now?
Well, because we know that this system is wrong on principle. It's not Mertonian. It can be “captured.” It is not a self-contained truth machine. We know hierarchies go wrong. We know how they go wrong, and all we need to do is dispassionately assess whether that applies to science.
The Natural Selection of Bad Science
One of the ways that hierarchies go wrong is it becomes too difficult to get to the top. The hierarchy becomes obsessed with deciding who’s first, which devolves into anti-Mertonian metrics.
“The natural selection of bad science,” (Smaldino and McElreath, 2016) is one of the great papers of the last ten years. It says that “generations” of scientists trained at successful labs where success is, in part, cheating may spawn more bad science.
It doesn't predict that a hierarchy that is too hard to climb selects the best scientists. It predicts the opposite. If you send ten people out to sea on a raft and one is left at the end, you may not have selected for anything good.
It takes half a human lifetime to get a first grant in the US. These are not people eager to blow up the system that chose them over so many others.
One of the ways that hierarchies go wrong is it becomes too difficult morally to get to the top. Once this happens, I believe, it becomes scarier not to blow it up.
But we don't need to speculate on morality, or the evolutionary fitness of scientists. All we need to do is look at the numbers, the reproducibility of the current practice of science. How bad does it have to be before we sacrifice recognized expertise? How bad before we agree that we need to prioritize at all, that we agree that some things are more important than others to make the numbers go up?
I don't need to tell you these numbers. To summarize, The Brazilian Reproducibility Initiative just published a preprint that has a very useful number that gets at the controversies around measuring reproducibility. Even it is imperfect, but considering specific biomedical study designs in Brazil, and every defensible measurement, the reproducibility rate is 26%, give or take ten (Brazilian Reproducibility Initiative, 2025).
Significantly, these results cannot be considered an improvement. Previous studies put the rate, broadly, at slightly under 50%. It's plausible to assume that existing interventions, warnings, and admonitions are not working.
The new study should come as no surprise, though. There has been a decade of work that found overwhelming problems at every stage of the research lifecycle. It found rigor practiced less than 50% of the time in: sharing required data (Alsheikh-Ali et al., 2011, Vines et al, 2014), not p-hacking (John et al., 2012), math (Manrai et al., 2014), registration (Harriman et al., 2016), reporting statistics (Brown and Heathers, 2016), randomization and blinding (Born, 2024), teaching (Koroshetz et al., 2020), and the central function of replication (Open Science Collaboration, 2015, Errington et al., 2021) and self-correction (von Hippel, 2022, Clarke et al., 2024).
This list is far from exhaustive, and these only concern the papers that are published, not what researchers attempt to publish, which is frequently provably false (Carlisle, 2020).
Psychology was tested ten years ago and should, of course, be retested.
The Red Team of Science
Restoring the adversarial nature of science by sacrificing expertise is what I call “The Red Team of Science” after the Lakens article on COVID research (Lakens, 2020), which borrows the term from high stakes fields like computer security and national defense. Science is a high stakes field.
Many of the references in this post came out after it was proposed four months ago. This one is notable because it is a similar proposal by someone with a similar starting point. That is, admitting how little has gone right in science reform in the last ten years. From this starting point, there are few viable options.
Csaba Szabo says in his book “Unreliable,” “the entire body of published literature must be systematically cleaned up. This should not be left to individual data detectives; this should be a concentrated effort spearheaded and funded by a consortium of the major publishers and conducted systemically, and very thoroughly, by professional data detectives” (Szabo, 2025).
Writing in For Better Science a few weeks later, he describes an otherwise excellent metascience conference, "in addressing these challenges, the conversation did not move beyond what I call the 'standard solutions'... that is, ideas and initiatives which are palatable to official stakeholders but have neither worked nor will ever really work." He goes on to describe something very similar to the list reproduced from Dudda et al. above, and what I call Plan A.
I disagree that a consortium of the major publishers is plausible, and he may disagree on parts of my proposal. However, I think there is similarity between our proposals because a plan that doesn't require adoption by existing stakeholders is all that’s left to propose.
Furthermore, I propose that it takes dramatically less time to become an expert in the things that science does wrong — base rate neglect, p-hacking, and publication bias — than it does to become a scientist. Certainly it is less than it takes to become an official stakeholder with control of a government grant. Research suggests that the more educated you are, the more favorable your opinion is of questionable research practices and fraud (Pickett et al., 2018) and there's evidence that laypeople can predict replicability. In other words, their participation is positive even without training (Hoogeveen et al., 2020).
The other difference between these two framings is that I believe The Red Team will arise with or without our help. It is the default solution. When one group, in this case scientists, schools, journals, and funders, has some interest, it’s only a matter of time before they run into another group with the opposite interest.
If you're familiar with American politics, you'll know why I say that a group with the opposite interest will take over. Some of these interests are purely political. Some are financial like short sellers who funded the Red Teaming of Alzheimer’s research for all of $18,000.
The group that rises up to clean up the literature may not have pure incentives. They may not be rigorous. What I propose is getting ahead of the politically motivated, the financially motivated, the people with an axe to grind. Fund training, recruiting, and support for professional auditors who don't have the incentives scientists normally have. Study how much expertise we can sacrifice to get less conflict of interest and raise the number of replicable studies.
Conclusion
It may be that the existing system of p-hacking, publication bias, and base rate neglect has to stay. It may be that we can’t “pay” for science without a hierarchy that rewards people based on metrics and weeds out unserious scientists. It may be that science gives too much rhetorical advantage to scientists over the professional auditors I propose. It could be that no auditor can ever convince the public, or other scientists when matched against people who spent half their lifetime to get their first grant. It's possible that another event like the Daryl Bem paper, or Diederik Stapel’s fraud will shock the conscience of science instead, and make poor methods as taboo as claiming divine intervention. It could be that science reforms will be enforced by norms, and by new funder rules.
I don’t believe this will happen. I believe things will get worse. The public's trust in science will continue to fall. There will be more energy behind politically motivated audits of science than there will be behind scientific ones. And even politically-motivated audits will find some of the bad papers.
I believe an occupation called Scientific Auditor is possible, inexpensive, and productive. We have to be careful to never let this position be compromised by conflict of interest, but otherwise it will work the same way the many adversarial systems do despite some sacrifice. The jury system and the press, for instance, are adversarial systems that depend on sacrificing some level of expertise.
Of course we know the danger of adopting the wrong proposal. However, this one recognizes the high stakes of science to a greater degree than what is recognized today and it is more cautious. The adversarial proposal elevates science to the level of the justice system or national security where it belongs.
References
Alsheikh-Ali, Alawi A., et al. “Public Availability of Published Research Data in High-Impact Journals.” PloS One, vol. 6, no. 9, Public Library of Science San Francisco, USA, 2011, p. e24357.
Born, Richard T. “Stop Fooling Yourself!(Diagnosing and Treating Confirmation Bias).” Eneuro, vol. 11, no. 10, Society for Neuroscience, 2024.
Brazilian Reproducibility Initiative. “Estimating the Replicability of Brazilian Biomedical Science.” BioRxiv, Cold Spring Harbor Laboratory, 2025, pp. 2025–04.
Brown, Nicholas JL, and James AJ Heathers. “The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology.” Social Psychological and Personality Science, vol. 8, no. 4, Sage Publications Sage CA: Los Angeles, CA, 2017, pp. 363–69.
Carlisle, John Bernard. “False Individual Patient Data and Zombie Randomised Controlled Trials Submitted to Anaesthesia.” Anaesthesia, vol. 76, no. 4, Wiley Online Library, 2021, pp. 472–79.
Clarke, Beth, et al. “The Prevalence of Direct Replication Articles in Top-Ranking Psychology Journals.” American Psychologist, American Psychological Association, 2024.
Cologna, Viktoria, et al. “Trust in Scientists and Their Role in Society across 68 Countries.” Nature Human Behaviour, Nature Publishing Group UK London, 2025, pp. 1–18.
Dudda, Leonie, et al. “Open Science Interventions to Improve Reproducibility and Replicability of Research: A Scoping Review.” Royal Society Open Science, vol. 12, no. 4, The Royal Society, 2025, p. 242057.
Errington, Timothy M., et al. “Investigating the Replicability of Preclinical Cancer Biology.” Elife, vol. 10, eLife Sciences Publications, Ltd, 2021, p. e71601.
Ferguson, Joel, et al. “Survey of Open Science Practices and Attitudes in the Social Sciences.” Nature Communications, vol. 14, no. 1, Nature Publishing Group UK London, 2023, p. 5401.
Harriman, Stephanie L., and Jigisha Patel. “When Are Clinical Trials Registered? An Analysis of Prospective versus Retrospective Registration.” Trials, vol. 17, Springer, 2016, pp. 1–8.
Hoogeveen, Suzanne, et al. “Laypeople Can Predict Which Social-Science Studies Will Be Replicated Successfully.” Advances in Methods and Practices in Psychological Science, vol. 3, no. 3, Sage Publications Sage CA: Los Angeles, CA, 2020, pp. 267–85.
John, Leslie K., et al. “Measuring the Prevalence of Questionable Research Practices with Incentives for Truth Telling.” Psychological Science, vol. 23, no. 5, Sage Publications Sage CA: Los Angeles, CA, 2012, pp. 524–32.
Korbmacher, Max, et al. “The Replication Crisis Has Led to Positive Structural, Procedural, and Community Changes.” Communications Psychology, vol. 1, no. 1, Nature Publishing Group UK London, 2023, p. 3.
Koroshetz, Walter J., et al. “Framework for Advancing Rigorous Research.” Elife, vol. 9, eLife Sciences Publications, Ltd, 2020, p. e55915.
Lakens, Daniël. “Pandemic Researchers–Recruit Your Own Best Critics.” Nature, vol. 581, no. 7807, Nature Publishing Group, 2020, pp. 121–22.
Manrai, Arjun K., et al. “Medicine’s Uncomfortable Relationship with Math: Calculating Positive Predictive Value.” JAMA Internal Medicine, vol. 174, no. 6, American Medical Association, 2014, pp. 991–93.
Nuzzo, Regina. “Fooling Ourselves.” Nature, vol. 526, no. 7572, Nature Publishing Group, 2015, p. 182.
Open Science Collaboration. “Estimating the Reproducibility of Psychological Science.” Science, vol. 349, no. 6251, American Association for the Advancement of Science, 2015, p. aac4716.
Pickett, Justin T., and Sean Patrick Roche. “Questionable, Objectionable or Criminal? Public Opinion on Data Fraud and Selective Reporting in Science.” Science and Engineering Ethics, vol. 24, Springer, 2018, pp. 151–71.
Smaldino, Paul E., and Richard McElreath. “The Natural Selection of Bad Science.” Royal Society Open Science, vol. 3, no. 9, The Royal Society, 2016, p. 160384.
Szabo, Csaba. Unreliable: Bias, Fraud, and the Reproducibility Crisis in Biomedical Research. Columbia University Press, 2025.
Vines, Timothy H., et al. “The Availability of Research Data Declines Rapidly with Article Age.” Current Biology, vol. 24, no. 1, Elsevier, 2014, pp. 94–97.
von Hippel, Paul T. “Is Psychological Science Self-Correcting? Citations before and after Successful and Failed Replications.” Perspectives on Psychological Science, vol. 17, no. 6, Sage Publications Sage CA: Los Angeles, CA, 2022, pp. 1556–65.
Wiradhany, Wisnu, et al. “Open Minds, Tied Hands: Awareness, Behavior, and Reasoning on Open Science and Irresponsible Research Behavior.” Accountability in Research, Taylor & Francis, 2025, pp. 1–24.