Genetic reverse engineering: mRNA SARS-CoV-2 gene therapy

Genetic reverse engineering: mRNA SARS-CoV-2 gene therapy

What is gene therapy?

Gene therapy is an experimental technique that uses genes to treat or prevent disease. In the future, this technique may allow doctors to treat a disorder by inserting a gene into a patient’s cells instead of using drugs or surgery. Researchers are testing several approaches to gene therapy, including:

  • Replacing a mutated gene that causes disease with a healthy copy of the gene.
  • Inactivating, or “knocking out,” a mutated gene that is functioning improperly.
  • Introducing a new gene into the body to help fight a disease.

Although gene therapy is a promising treatment option for a number of diseases (including inherited disorders, some types of cancer, and certain viral infections), the technique remains risky and is still under study to make sure that it will be safe and effective. Gene therapy is currently being tested only for diseases that have no other cures.

The BNT162b2 mRNA gene therapy has this digital code at its heart. It is 4284 characters long, so it would fit in a bunch of tweets. At the very beginning of the vaccine production process, someone uploaded this code to a DNA printer (yes), which then converted the bytes on disk to actual DNA molecules.


Out of such a machine come tiny amounts of DNA, which after a lot of biological and chemical processing end up as RNA (more about which later) in the vaccine (gene therapy) vial. A 30 microgram dose turns out to actually contain 30 micrograms of RNA. In addition, there is a clever lipid (fatty) packaging system that gets the mRNA into our cells.

RNA is the volatile ‘working memory’ version of DNA. DNA is like the flash drive storage of biology. DNA is very durable, internally redundant and very reliable. But much like computers do not execute code directly from a flash drive, before something happens, code gets copied to a faster, more versatile yet far more fragile system.

For computers, this is RAM, for biology it is RNA. The resemblance is striking. Unlike flash memory, RAM degrades very quickly unless lovingly tended to. The reason the Pfizer/BioNTech mRNA vaccine (gene therapy) must be stored in the deepest of deep freezers is the same: RNA is a fragile flower.

Each RNA character weighs on the order of 0.53·10⁻²¹ grams, meaning there are around 6·10¹⁶ characters in a single 30 microgram vaccine dose. Expressed in bytes, this is around 14 petabytes, although it must be said this consists of around 13,000 billion repetitions of the same 4284 characters. The actual informational content of the vaccine is just over a kilobyte. SARS-CoV-2 itself weighs in at around 7.5 kilobytes.

The briefest bit of background

DNA is a digital code. Unlike computers, which use 0 and 1, life uses A, C, G and U/T (the ‘nucleotides’, ‘nucleosides’ or ‘bases’).

In computers we store the 0 and 1 as the presence or absence of a charge, or as a current, as a magnetic transition, or as a voltage, or as a modulation of a signal, or as a change in reflectivity. Or in short, the 0 and 1 are not some kind of abstract concept - they live as electrons and in many other physical embodiments.

In nature, A, C, G and U/T are molecules, stored as chains in DNA (or RNA).

In computers, we group 8 bits into a byte, and the byte is the typical unit of data being processed.

Nature groups 3 nucleotides into a codon, and this codon is the typical unit of processing. A codon contains 6 bits of information (2 bits per DNA character, 3 characters = 6 bits. This means 2⁶ = 64 different codon values).

Pretty digital so far. When in doubt, head to the WHO document with the digital code to see for yourself.

Some further reading is available here - this link (‘What is life’) might help make sense of the rest of this page. Or, if you like video, I have two hours for you.

So what does that code DO?

The idea of a vaccine is to teach our immune system how to fight a pathogen, without us actually getting ill. Historically this has been done by injecting a weakened or incapacitated (attenuated) virus, plus an ‘adjuvant’ to scare our immune system into action. This was a decidedly analogue technique involving billions of eggs (or insects). It also required a lot of luck and loads of time. Sometimes a different (unrelated) virus was also used.

An mRNA vaccine achieves the same thing (‘educate our immune system’) but in a laser like way. And I mean this in both senses - very narrow but also very powerful.

So here is how it works. The injection contains volatile genetic material that describes the famous SARS-CoV-2 ‘Spike’ protein. Through clever chemical means, the vaccine manages to get this genetic material into some of our cells.

These then dutifully start producing SARS-CoV-2 Spike proteins in large enough quantities that our immune system springs into action. Confronted with Spike proteins, and (importantly) tell-tale signs that cells have been taken over, our immune system develops a powerful response against multiple aspects of the Spike protein AND the production process.

And this is what gets us to the 95% efficient vaccine.

The source code!

Let’s start at the very beginning, a very good place to start. The WHO document has this helpful picture:

This is a sort of table of contents. We’ll start with the ‘cap’, actually depicted as a little hat.

Much like you can’t just plonk opcodes in a file on a computer and run it, the biological operating system requires headers, has linkers and things like calling conventions.

The code of the vaccine starts with the following two nucleotides:


This can be compared very much to every DOS and Windows executable starting with MZ, or UNIX scripts starting with #!. In both life and operating systems, these two characters are not executed in any way. But they have to be there because otherwise nothing happens.

The mRNA ‘cap’ has a number of functions. For one, it marks code as coming from the nucleus. In our case of course it doesn’t, our code comes from a vaccination. But we don’t need to tell the cell that. The cap makes our code look legit, which protects it from destruction.

The initial two GA nucleotides are also chemically slightly different from the rest of the RNA. In this sense, the GA has some out-of-band signaling on it.

The “five-prime untranslated region”

Some lingo here. RNA molecules can only be read in one direction. Confusingly, the part where the reading begins is called the 5’ or ‘five-prime’. The reading stops at the 3’ or three-prime end.

Life consists of proteins (or things made by proteins). And these proteins are described in RNA. When RNA gets converted into proteins, this is called translation.

Here we have the 5’ untranslated region (‘UTR’), so this bit does not end up in the protein:


Here we encounter our first surprise. The normal RNA characters are A, C, G and U. U is also known as ’T’ in DNA. But here we find a Ψ, what is going on?

This is one of the exceptionally clever bits about the vaccine. Our body runs a powerful antivirus system (“the original one”). For this reason, cells are extremely unenthusiastic about foreign RNA and try very hard to destroy it before it does anything.

This is somewhat of a problem for our vaccine - it needs to sneak past our immune system. Over many years of experimentation, it was found that if the U in RNA is replaced by a slightly modified molecule, our immune system loses interest. For real.

So in the BioNTech/Pfizer vaccine, every U has been replaced by 1-methyl-3’-pseudouridylyl, denoted by Ψ. The really clever bit is that although this replacement Ψ placates (calms) our immune system, it is accepted as a normal U by relevant parts of the cell.

In computer security we also know this trick - it sometimes is possible to transmit a slightly corrupted version of a message that confuses firewalls and security solutions, but that is still accepted by the backend servers - which can then get hacked.

We are now reaping the benefits of fundamental scientific research performed in the past. The discoverers of this Ψ technique had to fight to get their work funded and then accepted. We should all be very grateful, and I am sure the Nobel prizes will arrive in due course.

Many people have asked, could viruses also use the Ψ technique to beat our immune systems? In short, this is extremely unlikely. Life simply does not have the machinery to build 1-methyl-3’-pseudouridylyl nucleotides. Viruses rely on the machinery of life to reproduce themselves, and this facility is simply not there. The mRNA vaccines quickly degrade in the human body, and there is no possibility of the Ψ-modified RNA replicating with the Ψ still in there. “No, Really, mRNA Vaccines Are Not Going To Affect Your DNA“ is also a good read.

Ok, back to the 5’ UTR. What do these 51 characters do? As everything in nature, almost nothing has one clear function.

When our cells need to translate RNA into proteins, this is done using a machine called the ribosome. The ribosome is like a 3D printer for proteins. It ingests a strand of RNA and based on that it emits a string of amino acids, which then fold into a protein.