After 15 weekly posts, and a few noteworthy developments, here are updates on stories from the last few months.
This post is now watchable, although my public speaking skills may not do the topic justice. There’s a friendly debate with president of SIPS, Priya Silverstein afterwards that may be interesting to those deep enough in open science lore to appreciate the undercurrents. She says she’s more optimistic about adversarial collaboration as opposed to adversaries. I said I hope she’s right.
SIPS was more than charitable giving me a spot and Dr. Silverstein was constructively critical. Obviously, I am not optimistic and that’s the central disagreement highlighted in the post and on this blog.
The undercurrents, and the SIPS pre-conference discussion with early replication crisis reformers may be worth a follow up at some point. For now, I can recommend the pre-conference discussion about what worked ten years in, and the metascience reading list as great bookends on the replication crisis.
A correspondence with The Lancet about an apparent subtraction error
This is my correspondence with The Lancet on the 2023 paper "Does repeated influenza vaccination attenuate effectiveness? A systematic review and meta-analysis" following my comment on PubPeer from January 2024.
COPE is back in the news. They are promising an update this month that would recommend giving credit to people who point out retractable errors in papers. Not giving credit to people reporting errors is one of the most indefensible policies in publishing.
The Lancet subtraction question part II
The authors of “Does repeated influenza vaccination attenuate effectiveness?” responded to my PubPeer comment, which was the subject of a correspondence with The Lancet I published May 26th. Below is their response from PubPeer and my response to them. I added the direct correspondence with the authors to the
After I posted this response, I emailed the authors again because they had sent me an email asking what remaining questions I had. The matter seemed settled to them: subtracting effects is standard practice (it’s not) and the numbers sometimes don’t look anything like subtraction because they’re meta-analyzed.
A great rule of thumb for debate in general is “that which proves anything proves nothing.” Saying “just run the code,” or that something technically could be true because anything is possible in statistics are not good arguments. In this case, I publicly stated that I’m wrong if they can provide a handful of numbers that produce two rows of the table and they have declined.
(Regardless, I did run the code and could not find any numbers that result in more than 1-2% difference from subtraction. This is mostly apparent just looking at the code, or reading the paper where they include an equation.)
The Lancet subtraction question is also notable because it is the most obvious error with the lowest reputational stakes and highest public health stakes (by which I mean vaccine effectiveness is important). The first author went on to another career outside science. The apparent errors are in the supplementary material and wouldn’t require retraction on their own.
Nonetheless, I did not get a response. I suspect the email asking for further questions was one author trying to distance themselves without explicitly disagreeing with the co-authors. I’m not going to indulge this possibility because I’d been asking for over a year for any statements by the journal and the authors to be made publicly, not by email, and I still am.
COPE recommends using the same platform where the error was reported (PubPeer).
NIH encourages schools to teach rigor
Seven years ago, the long arc of the Community for Rigor began with an NIH survey. It found that there are few classes on rigorous research practices in higher education. This should have come as no surprise since, by then, many incontrovertibly good practices had been measured at less than
One of the major topics in these posts is that Community for Rigor claims to take community feedback publicly, and then really resists it behind the scenes. They said this was because “we cannot guarantee that all users’ ideas will be reflected in our materials. That would unfortunately be impossible to implement” and that hundreds of people had signed the wavier to give feedback, contrary to my objections to the wavier.
The update is that I wanted to know if they have now taken any feedback from these hundreds of people and they did. The slides for the confirmation bias lecture have been updated to include some of my feedback:
Correct attribution of “researcher degrees of freedom” to Simmons et al. (2011)
Addition of pre-analysis plans with preregistration
Addition of more post hoc choices
“Employing peer review,” which I said usually means journal peer review instead of adversarial collaboration or red teaming, in other words things researchers have a choice to adopt, has been removed.
An activity now doesn’t have implausible Cohen’s d values, and are labeled in standard deviations.
The possibility of unmasking even if treatment has no beneficial effect is now included.
Additionally, the materials mention the term p-hacking now. I didn’t recommend this, but I should have.
My expectation was that no changes would be made, certainly not ones I’d suggested, so that’s great. My thesis about Community for Rigor is that they shy away from confrontation with their colleagues. I think that’s been tried in metascience and it didn’t really work. (Of course, C4R deserves credit for doing anything at all, which is great too.)
Update 2: FOIA
My long-awaited and long-haggled-over FOIA request for Community for Rigor emails has been promised early this week. The same thesis about conflict applies: I suspect NIH/NINDS and the administrators of the project at University of Pennsylvania knew that some grantees were avoiding conflict with their colleagues by saying the practice of science is great already. No reform necessary. In general, I think all of the participants will tend to avoid conflict and say the practice of science is better than it is.
Concrete examples were given in the original post: Mary Harrington depicting the research life-cycle as a highway one can get on and off at any time, and where you can collect and analyze data and then change your hypothesis. Benedict Kolber’s group saying “most published research is not truly confirmatory (and that’s ok)” (which was amended in the materials at Penn by adding “However, exploratory work is sometimes reported as confirmatory. (and that’s NOT okay!!)"). The guest speaker saying preregistration is not a good idea and everyone nodding along.
My hypothesis about the FOIA documents is that the limiting factor in metascience is not cognitive (See, for example, Wiradhany et al., 2025). It is conflict of interest. Scientists and metascientists don’t want to get on the wrong side of mainstream science, which outnumbers metascience by many multiples. In other words, scientists working in academia and even NIH officials are not appropriate adversaries.
I will be shocked if this doesn’t show up in the documents and I’ll admit I was wrong about C4R if it doesn’t. Obviously, if I see a lot of conflict — particularly productive conflict — that counts against my hypothesis.
The cherry on top
If this blog has a thesis, it is that research is troubled and addressing it with half-measures may make things worse. Science may boot out or exhaust the reformers. The reformers may be “captured” the way that industries capture their regulators. This is because the incentives in science are strong. We assumed in the early days of the replication crisi…
As mentioned in a previous post, I had not realized when I posted about the fragile p-value paper that the author had said essentially the same thing on Bluesky, that the effect could be more p-hacking.
No YOU'RE a slug
In the past month, the reproducibility crisis has become a Republican thing. Trump's “Gold standard” executive order spurred the “Stand Up for Science” letter signed by Brian Nosek and others. Although the signatories agreed up through section 6 of the order, and Brian said section 3 is all the things his
There was an otherwise great post in Sensible Medicine that used Holden Thorp’s op-ed to support the idea that “comparatively small numbers of scientists... can drive negative public narratives about science.” I would recommend the companion paper too. My criticism of the authors is they tiptoe around the behaviors that plausibly caused the replication crisis, and that many many researchers engage in. That rebuttal will be published soon.
The "broken clock" reaction to politics in science
Retraction Watch published a guest post “NIH-funded replication studies are not the answer to the reproducibility crisis in pre-clinical research” recently. It was alarming to supporters of replication and, apparently, supporters of political independence.
I emailed Andrew Gelman redteamofscience.com and he provided a counterpoint on Jay Bhattacharya’s dedication to debate that I had published in this post. Andrew says that Jay misrepresented his views (see Gelman, and Bhattacharya). The exchange is accessible and interesting, and I agree with Andrew. “[Gelman] incorrectly thought we had not accounted for the possibility of false positives” is not a good way to characterize what happened. Andrew’s criticisms of the Santa Clara paper has everything you’d want in a post-publication review: expertise, humility, attention to detail, and a few barbs. The CI and specificity arguments are sound, and it’s notable that paper authors go to great lengths to avoid crossing zero.
That said, I still think the political platform about restoring debate in science is valuable enough to support Jay’s whole shtick. I’d support a liberal NIH director with the same principles. That’s not to say Andrew’s objection doesn’t “move the needle” on this support at all. It does. It’s a ding on Jay's reputation, no more or less than what I think was intended.
The politics of embarrassment: Part III
Part III in a series of essays centered around Imre Lakatos’ “Science and Pseudoscience” takes issue with “Universities are Worth Saving” by Jonathan Rauch and Sam Harris’ podcast episode defending expertise (“Intellectual Authority and its Discontents
In an unsurprisingly unpopular post about how the place you all work is bad, I also decided to discuss COVID origin. Andrew Gelman happened to post about it too, which is extremely helpful, and the reason I emailed him above.
We can’t take things like the lab leak with a reasonable but small chance of being true and say, “Well, then they’re not true.” One anonymous commenter on Gelman’s blog put it this way, using the same statistic I used:
Some scientists (20% of experts) lean towards a lab origin. It’s a number higher than the percentage of experts who lean towards creationism or climate denial, so therefore we can certainly agree the truth is less certain than on those two issues. However, after reviewing the evidence I lean towards >99.9% zoonosis.
I think we need to stop this, and carving out extreme positions is very clearly just caused by the internet.
I’m going to dedicate a post to a simple statistical argument on COVID origin in the coming weeks. (This is a good time to say that subscribing anonymously is easy with an RSS reader and there are lots of good metascience blogs available this way. The feed is: redteamofscience.com/feed)