Using AI for scientific discovery

Using AI for scientific discovery

Inside every cell in your body, billions of tiny molecular machines are hard at work. They’re what allow your eyes to detect light, your neurons to fire, and the ‘instructions’ in your DNA to be read, which make you the unique person you are.

These exquisite, intricate machines are proteins. They underpin not just the biological processes in your body but every biological process in every living thing. They’re the building blocks of life.

Currently, there are around 200 million known proteins, with another 30 million found every year. Each one has a unique 3D shape that determines how it works and what it does.

But figuring out the exact structure of a protein remains an expensive and often time-consuming process, meaning we only know the exact 3D structure of a tiny fraction of the proteins known to science.

Finding a way to close this rapidly expanding gap and predict the structure of millions of unknown proteins could not only help us tackle disease and more quickly find new medicines but perhaps also unlock the mysteries of how life itself works.

What is the protein folding problem?

Proteins are large, complex molecules essential to all of life. Nearly every function that our body performs - contracting muscles, sensing light, or turning food into energy - relies on proteins, and how they move and change. What any given protein can do depends on its unique 3D structure. For example, antibody proteins utilised by our immune systems are ‘Y-shaped’, and form unique hooks.

By latching on to viruses and bacteria, these antibody proteins are able to detect and tag disease - causing microorganisms for elimination. Collagen proteins are shaped like cords, which transmit tension between cartilage, ligaments, bones, and skin. Other types of proteins include Cas9, which, using CRISPR sequences as a guide, act like scissors to cut and paste sections of DNA; antifreeze proteins, whose 3D structure allows them to bind to ice crystals and prevent organisms from freezing; and ribosomes, which act like a programmed assembly line, helping to build proteins themselves.

The recipes for those proteins - called genes - are encoded in our DNA. An error in the genetic recipe may result in a malformed protein, which could result in disease or death for an organism. Many diseases, therefore, are fundamentally linked to proteins. But just because you know the genetic recipe for a protein doesn’t mean you automatically know its shape.

Proteins are comprised of chains of amino acids (also referred to as amino acid residues). But DNA only contains information about the sequence of amino acids - not how they fold into shape. The bigger the protein, the more difficult it is to model, because there are more interactions between amino acids to take into account.

As demonstrated by Levinthal’s paradox, it would take longer than the age of the known universe to randomly enumerate all possible configurations of a typical protein before reaching the true 3D structure - yet proteins themselves fold spontaneously, within milliseconds. Predicting how these chains will fold into the intricate 3D structure of a protein is what’s known as the “protein folding problem” - a challenge that scientists have worked on for decades.

This unsolved problem has already inspired countless developments, from spurring IBM’s efforts in supercomputing (BlueGene), to novel citizen science efforts (Folding@Home and FoldIt) to new engineering realms, such as rational protein design.


Read more