The Lancet subtraction question part II

A visit to the ocean of unrunnable code

Jun 02, 2025

The authors of “Does repeated influenza vaccination attenuate effectiveness?” responded to my PubPeer comment, which was the subject of a correspondence with The Lancet I published May 26th. Below is their response from PubPeer and my response to them. I added the direct correspondence with the authors to the previous post.

The usual preamble applies. The authors are not “bad apples.” Statistically, it's very likely they are part of the 74% of research code repositories that don't run (Trisovic et al., 2022), and the 93% of authors who claim unchanged results (Hardwicke et al., 2022). It’s much more productive and interesting to think about how to improve these numbers systemically.

PubPeer response from the authors

Sheena G Sullivan

For each individual study/age-group/virus/season combination delta VE is calculated as the difference between two vaccination group VE estimates (e.g. current and previous season VE - current season VE). The summary delta VE estimates highlighted in your table are meta-analysed estimates of the individual deltas for each age-group/virus/season combination. To reproduce these estimates, please refer to the R script provided in the supplementary material. Our results should be reproducible using the information provided in the forest plots.

My response

I think we're closer to the goal. Thank you to the authors for replying. What we have established is that the formula VEcp - VEp is not what was used to calculate ΔVEp. The calculation is in the R script and the data needed to run the script is in the forest plots.

In general this is not an acceptable answer. Even if I recalculated the numbers from this description, I wouldn't be able to show that the numbers are right or wrong (other than by inflicting this task on someone else). This goes for almost all responses from authors of this form.

I don't think I need to justify this further. “Piece together the data from the figures and run the code” is almost never an acceptable answer. Accepting this answer means no corrections on papers with some data values and a script of some kind, and more burden on post-publication reviewers. Even if there is some calculation that could be called a meta-analysis that produces these numbers, this answer is not acceptable on principle.

However, maybe I am not familiar enough with the domain and what the authors are saying is trivial. So I will justify why this is not an acceptable answer in this particular case.

1. The script in the supplement PDF is not a runnable script and it has been edited for publication. There's no way to tell how much it was edited and whether or not this affects the result. For instance, there are smart quotes in some cases and not in others. This usually happens when a document has been edited in Word or another word-processor. Smart quotes are not legal characters in R.

More definitively, ma.dat$delta.ve.ci is not defined. This would halt the program before the meta-analysis if the published script were the script that was used by the authors.

One comment in the script says “Meta-analysis model [shown for A(H1N1)pdm09].” The full script would need to handle more than a specific strain to answer my question. (“A(H1N1)pdm09” is then hard coded in the next lines, so the comment appears to be accurate.)

There is very little information in the script. The first third of it is duplicative code that could be replaced in a few lines. The purpose is to apply two expressions (log((100-x)/100) and (ub - lm)/3.92) to several variables.

The second third is a bootstrapping procedure for a meta-analysis, and the last third is a standard call to metafor::rma and calculations that can be done automatically in metafor. It ends with printing out a list in text of estimates and CIs.

Given that this is not the literal script use to calculate the tables and figures, I wouldn't know if the calculations are different for other viruses or years, and what else is in the full script.

This is trivially true, but there is a comment that appears to be left over from the full script at the top that underscores the problem: “Clean the data set to include studies included in meta-analysis.” It is not followed by code to include/exclude studies. There's plenty of ambiguity there to cast doubt on any numbers a post-publication review might come up with.

Even if all of this could be excused, the seed is set to 725. Setting a seed is good practice. (Usually it would be a “special” number like 0, 1, 1234 etc. to nominally demonstrate that the author only set it once. Nothing definitive here unless the seed happens to be very beneficial to the author and that can't be shown without the full script.)

Each run of rnorm depends on this value and the runs depend on the number of rows in the data and the order in which the subsets are run. Without the full script that was used, random chance can't be ruled out in comparing numbers.

2. The data in a meta-analysis is not private, proprietary, or sensitive. So the “collect data from the figures” defense needs to be rejected for a meta-analysis. According to the paper itself, the authors promise to share the data. This could have been done 1.5 years ago. It could have been done as a response to the existing commentary on this paper (Cowling and Zhong, 2023) that says the data could be used for calculations that would be more interesting to the field. I did not request access and I don't know if the data was given to the authors of that commentary. However, this is another reason to reject the process the authors are proposing.

The forest plots are images, not text. Getting the data out of them would involve manually typing or using an unreliable computer vision method. I am not convinced this would work even if it were done, and it wouldn't yield comparable numbers for the reasons in #1.

3. The core function in a meta-analysis is more complex than subtraction, but it is still just a weighted average. The authors can respond with an unambiguous formula that produced these numbers with the values for this specific table. I will suggest the authors show the input values for at least one row that is very different from subtraction, and one that is almost exactly subtraction to show positive and negative cases.

The full script and dataset seem necessary now that I have had to read the script to justify this response. However, the original PubPeer question can be answered with some numbers and a formula.

4. There are lots of methodological issues with the script (a questionable, bespoke bootstrap process, selective and premature rounding). It doesn’t give me more confidence that there's no error.

Even though this is very very common, it should be said that the script was not written with the care this domain requires. Almost every aspect of it is rewriting meta-analysis code that already exists in well-tested libraries. The redundancy and statistical flubs indicate that it's very possible the full script contains errors.

5. I did run the script on lots of data and the results are no different from subtraction. This is also fairly clear from reading the script.

For these reasons, and the many reasons in the email exchange with The Lancet I propose the procedure in #3 above.

Both The Lancet and the authors have done what I think is the right thing in responding. I have many complaints about the way it was done, but I have to give them credit for responding. I hope this can be resolved unambiguously.

References

Cowling, Benjamin J., and Shuyi Zhong. "Repeat vaccination and influenza vaccine effectiveness." The Lancet Respiratory Medicine 11.1 (2023): 2-3.

Hardwicke, Tom E., et al. "Post-publication critique at top-ranked journals across scientific disciplines: a cross-sectional assessment of policies and practice." Royal Society Open Science 9.8 (2022): 220139.

Trisovic, Ana, et al. "A large-scale study on research code quality and execution." Scientific Data 9.1 (2022): 60.

The Red Team of Science

Discussion about this post