Scientists have examined the Nextstrain and GISAID databases and found that the L452R mutation is present in more than 400 SARS-CoV-2 genomes isolated from over 20 countries. This indicates a strong positive selection for L452R mutation.
The scientists identified L452R amino acid substitution in the spike region as the dominant mutation in specimens collected since November 2020. Specifically, they observed that two independent SARS-CoV-2 variants (CAL.20C and CAL.20A) containing spike L452R mutation emerged recently in the state of California. Of these variants, CAL.20C (clade 20C; lineage B.1.429) is considered to be the predominant variant in California since November 2020. However, the CAL.20A variant (clade 20A; lineage B.1.232) identified in this study has emerged much more recently than CAL.20C and is primarily circulating in California. Based on the phylogenetic analysis, the scientists indicated that L452R mutation is the primary driving force behind the emergence of both variants. Such an increase in L452R mutation frequency in recent SARS-CoV-2 variants directly indicates its crucial involvement in viral adaptive evolution due to positive selection.
Interestingly, they found CAL.20A variant from a gorilla at the San Diego Zoo, which contains two additional mutations in the non-structural protein 2 (NSP2).
In contrast to CAL.20C, no massive clonal expansion was observed for CAL.20A. According to the study findings, CAL.20C contains two additional spike mutations along with L452R, which are missing in CAL.20A. The scientists believe that these additional mutations may be responsible for increasing the adaptive benefits of L452R, and because of this reason, CAL.20A could not achieve the same expansion rate as CAL.20C.
We sequenced the SARS-CoV-2 genome from 688 positive samples collected from December 28 2020 to March 13 2021 in Arizona, USA. 638 high-quality complete genomes were successfully sequenced that included variants such as B.1.1.7, B.1.427/429 and P.2. We detected 7 genomes associated with a common B.1.243 variant that had acquired an E484K mutation in the spike protein.
The novel variant had 11 lineage-defining mutations including V213G and E484K in the spike gene, a 9-nt deletion in ORF1ab (ΔSGF3675-77), a 3-nt insertion in the non-coding intergenic region upstream of the N gene and other synonymous substitutions (Figure 1A, Supplementary Table 1). These 11 conserved mutations are distinct from the mutations associated with the parental lineage, B.1.243. The parental B.1.243 lineage is a common circulating variant in the US that was first observed at the start of the pandemic as early as March 2020 (Figure 1B, 96.9%). The B.1.243 parent lineage encodes the spike gene D614G substitution, but none of the other concerning mutations (Figure 1A, Supplementary Table 2). Therefore, we designate the new E484K harboring variant the provisional name of B.1.243.1.
The CAL.20C variant accounts for nearly half of COVID-19 cases in Southern California and about a third of cases in the state based on an analysis of viral genomes posted to a global database called GISAID.
What’s more, the researchers found that by the end of January, the variant had spread to 19 other states, up from five states in November 2020. It has also spread beyond the U.S. to six other countries — Australia, Denmark, Israel, New Zealand, Singapore and the United Kingdom.
“We detected a novel strain descended from cluster 20C and defined by five mutations (ORF1a: I4205V, ORF1b:D1183Y, S: S13I;W152C;L452R)(Figure 1). This strain, CAL.20C, was first observed in July 2020 in 1/1230 samples from LA county and not detected in Southern California again until October. Since then, this strain’s prevalence has increased absolutely and relatively in Southern California, where by December it accounted for 24% of all samples (Figure 2A) and 36.4% (66/181) of our local Los Angeles cohort.”