Sunday, October 30, 2022

COVID: The Latest Evidence for a Lab Leak Isn't Great

Ever since COVID-19 burst upon the world in early 2020, there have been people who really want to blame its origin on a Chinese research lab--either as an intentional weapon or an inadvertent leak. For some it's a sincere inquiry into what happened. For others it's an opportunity to score political points or to monetize the issue. And some people are just conspiracy/contrarian-minded or want someone to blame for their frustrations. Even just a few days ago the staff of the Republican minority of the Senate Committee on Health Education, Labor and Pensions released an interim report asserting that the COVID virus most likely originated in the Wuhan Institute of Virology [1]. I may have more to say about the report later, but for now I want to focus on a recent pre-print paper (not peer reviewed) that caused a stir a couple of weeks ago. I don't have the energy to give an in-depth explanation, but I thought it might be worth making some notes for myself and register my opinion for any who might care.

The paper claims to find evidence that SARS-CoV-2 (SC2) was manipulated in a lab based on the distribution of recognition sites of two restriction enzymes that the authors (none of whom are virologists) claim would be useful in piecing the genome of SC2 together to manipulate in a lab. The paper further claims that the distribution of the sites is statistically unlikely, and that by comparison to other coronavirus genomes it appears that the Wuhan scientists removed several recognition sites that would have interfered with the hypothesized strategy of piecing the genome together. At first glance it is a clever approach and I initially thought I would need to revise my opinion about lab manipulation. But there are problems.

1. The two enzymes can be used in a way that does not require their recognition sequence to be in the genome sequence at all, and while the authors of the paper point to prior work of the Wuhan lab to justify focusing on these two particular enzymes, that work used the enzymes in a way that did not incorporate them into the final sequence. Moreover, the first group to publish work piecing the genome together in a lab (at the University of Texas Medical Branch) used these same enzymes, but also in a way that did not require them to be in the actual viral sequence. In other words, even if the Wuhan scientists were tinkering with the original SC2 virus, there is no reason to think they would use the strategy described by this paper.

2. In the prior Wuhan work with a related coronavirus, they adjusted the way they split up the genome sequence such that the spike protein was a single fragment. The paper under discussion here has the spike split between two fragments. If the intent of the Wuhan lab was to tinker with the spike protein, then the stragety of their prior work makes more sense than that proposed in this paper.

3. Two of the enzyme sites that were allegedly eliminated by mutation in order to faciliate the proposed cut-and-paste strategy are also not found in the genome of a bat virus that was not included in the analysis--known as RpYN06--with the same mutations causing the loss. Moreover, the surrounding sequence is also closely related to RpYN06 (something the authors were apparently told several months ago, but ignored). In other words, it's not suspicious at all that SC2 does not have those recognition sites, and it can be explained by the extensive recombination history of the genome.

4. Since large fragments of the genome would be less ideal for cutting and pasting DNA together, the authors did some computer analysis to see how long the longest fragment would be (as a percentage of the genome) when a variety of cornaviruses are cut with different randomly chosen enzymes. They found that SC2 is at the lower end of the distribution, as are other coronaviruses that have been engineered (i.e. altered to allow cutting and pasting of the genome together in the lab), suggesting that SC2's longest fragment in this proposed strategy is statistically unlikely. But their own analysis shows that some known natural viruses are even MORE unlikely. That's kind of like accusing a person of using steroids just because they are strong, when they aren't even the strongest person in the room.

5. The alleged strategy outlined by the paper would involve splitting the genome into 6 fragments, most of which would be several thousand nucleotides long. But one of them would only be 643 nucleotides long, for no apparent reason.

In summary, the strategy hypothesized by the paper doesn't make sense in the face of known alternatives, the existence of RpYN06 undercuts the allegation that Wuhan scientists altered the genome sequence to remove problematic recognition sites, and the statistical analysis is far from any kind of slam-dunk detection of engineering. In the end, it seems this is just another case of anomaly hunting.

Here is a useful Twitter thread the covers a lot of this: LINK

Notes:
1. I've skimmed the report. Reception among prominent virologists is pretty poor thus far. Also, some of the report's allegations of poor facility management are apparently based on incorrect translation of Chinese documents.


Continue reading...

  © Blogger templates The Professional Template by Ourblogtemplates.com 2008

Back to TOP