What is a “variant” as it relates to COVID-19? (for non-geneticists)

Colby T. Ford, Ph.D.
5 min readFeb 5, 2021

If you watch or read the news, there are a flurry of reports of SARS-CoV-2 variants popping up in the US and around the world. Some reports generalize and might say something like “The South African variant was found in South Carolina” or “The E484 variant was seen in Massachusetts”. This can be confusing as the media’s coverage of this is a little vague and maybe a bit misleading.

In this post, I want to bring everyone up to speed about what qualifies as a variant and how it relates to public health in this COVID-19 pandemic.

What is a Variant?

In the realm of genomics, various definitions may exist for the terms “variant” and “mutation”, but I’ll attempt to delineate them and cement their meanings in terms of this virus. (Note: These terms are used interchangeably, but these are my definitions as I think about them.)

Mutation: Usually describes a change in the viral RNA or protein. (For example, “the G nucleotide base at position 123 has now mutated to a T”.)

Variant: Usually characterizes the mutation a bit further as it relates to the viral proteins. (For example, “the D amino acid at position 12 is now a Q”.) This terminology is what most scientists are using to name particular variants (e.g., E484 or N501).

We find these variants using a method called “variant calling”, which compares a new sample’s sequence to a reference sequence and looks at positions in the genome that are different between the two. Most commonly, bioinformaticians are using an early 2020 sequence from Wuhan in Hubei, China (accession: NC_045512) as the reference sequence.

For example, let’s compare the reference sequence to a sequence collected in North Carolina (accession: MT52282). We first align the new sequence to the reference sequence and then look for changes. Notice a G>T change at position 25563 below:

Alignment shown in AliView using MUSCLE.

Then, we can translate these RNA sequences into amino acids to see if this changes the resulting viral protein. Spoiler alert: it does!

Notice the Q⮞H change at position 8521 (which, in terms of the specific protein ORF3a, happens to the position 57). Thus, this variant is called Q57H.

So, when you see the news talking about E484 or N501, you now know this means that there’s a change in the amino acid of the protein at that location.

Sequencing is Key

Data is power and, when it comes to understanding the spread of these viral variants, genetic sequences are a goldmine of information. Today, there are two main public repositories for SARS-CoV-2 sequences: NCBI’s GenBank and GISAID. However, despite over half a million sequences being available, we still fail to have equal representation geospatially, especially here in some US states. While we test millions of people, we only sequence a small portion of these samples.

As sequencing becomes more common, we’ll be able to better understand which variants are common geospatially and, for any variants that are a cause for concern, plan accordingly with public health guidance.

SARS-CoV-2 variants in North America as of 2/4/21. Source: janieslab.github.io/sars-cov-2 and Ford et al. (2021)

Not all Variants are Created Equal

When the media talks about a variant and alludes to how dangerous or unknown it may be, this isn’t always the case. Here are some things a variant might change (as it relates to humans):

  1. Transmissibility — Certain variants may increase the interaction between a viral protein and human protein, making the virus easier to infect and spread. Specifically, changes in the S protein that increase the binding affinity between it and the human ACE2 protein in our lungs.
  2. Virulence or Pathogenicity — Certain variants may increase the severity of an infection. This could have an impact on the chances of survival or long-term effects of COVID-19.
  3. Vaccine Efficacy or Reinfection — Certain variants may cause the virus to be unrecognizable by your body’s immune system and therefore would require your body to create new antibodies. This means that some mutations may make it possible to be re-infected with SARS-CoV-2 if it is a variant that your body doesn’t recognize with existing antibodies. This directly relates to the efficacy of vaccines. Luckily, vaccine manufacturers have published press releases (see Moderna’s and Pfizer’s posts) around their vaccines’ effectiveness against new variants.
  4. Nothing at all — A mutation in a viral genome may not do anything at all or may not do much in regards to its effect on human life.

Co-Occurrence ≟ Travel

One key takeaway here is that just because a variant shows up in two geographical locations at the same time, it doesn’t necessarily mean that someone “hopped on a plane in South Africa and flew to South Carolina” to spread that variant.

One other option is a phenomenon called “convergent evolution” or “homoplasy”. When we put selective pressure like treatments and vaccines on a pathogen, it possible that the pathogen will mutate in the same genomic location (or mutate in a similar way) in separate geographical locations out of necessity to survive and proliferate.

The Law of Smaller Numbers

Viral mutations in SARS-CoV-2 will happen regardless of travel restrictions, vaccines, treatments, and other protections. However, limiting the spread will limit the number of times the virus replicates and, in turn, limits the chances of these mutations. Over the next few months, the roll out of vaccines will slow the progression of this pathogen, which will be crucial in slowing mutations as well.

So, stay home, wear your mask, get a vaccine ASAP, and slow the spread.

#nipitinthebud #staycurious

--

--

Cloud genomics and AI guy and aspiring polymath. I am a recovering academic from machine learning and bioinformatics and I sometimes write things here.