Why most of your DNA is garbage

We’ve all heard that our DNA code is what makes us unique. It determines our height, health, and hair colour. It even sets your personality to some extent.

You’d expect that given this stuff controls almost everything about how our bodies develop and work, its 3 billion letter-long code (equivalent to only about 0.38 gigabytes!) would have its work cut out for it. But as it turns out, efficiency and compactness don’t seem to be DNA’s number one priority.

What we expect

To start, how is the information in DNA used anyway? The “rungs” of the iconic double helix ladder of DNA are made of base pairs: A, T, C, or G. A common explanation is that these base pairs are “read” by machinery in the cell and ultimately translated into instruction to make proteins. These proteins are then constructed and go on to carry out their various functions, like building pretty much our entire body and keeping us from dying.

So that’s what DNA does, right? Well, only about 1.5% of it does. To be exact, this is the percentage that encodes actual information – so called “exons”. About 26% of the genome is the associated “introns”, which are technically part of protein coding genes but don’t hold actual information.

Almost 99% of our genome does not in fact code for proteins. But this doesn’t mean it’s entirely useless (yes, the title is a bit of an exaggeration). Let’s see what that 99% is really up to.

Photo credit: Polygon Medical Animation on VisualHunt / CC BY-NC-ND

Lots of regulation

In the nucleus of the cell, where DNA lives, biomolecular machines go around scanning through its base pairs and synthesise stuff called messenger RNA (mRNA). This is a temporary “recipe” molecule that can be read to create a desired protein. But it would be quite chaotic if mRNA was constantly being produced from every part of the genome, leading to an unfiltered avalanche of every protein. That’s where regulatory sequences come in.

Aside from encoding a recipe, DNA can act as a “bar code”. Special proteins can read particular sequences and interpret them as signals to enhance or inhibit protein expression.

Regulatory sequences also mark were the genes (section of DNA which code for a protein) and other features are physically located. Without them, the translating machinery wouldn’t know where to go.

Depending on how the regulatory portion of DNA is defined and measured, it makes up between 8% and 20% of the whole genome. This relatively larger number tells us that it is the regulation of expression, not just the coding genes, that are crucial for our fine-tuned complexity.

Other bits and pieces

A small part of the genome is actually structural. There are telomeres which serve as protective end caps at the end of the DNA molecule, and centromeres which are involved in cell division.

Remember messenger RNA from before? There are a myriad other types of RNA products that DNA encodes that don’t go on to make proteins. Together these are grouped as non-coding RNA (ncRNA). While a good portion of ncRNA do serve important function, most of it is probably useless.

Alglascock [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)]
Repetition repetition repetition

50% of the genome can be broadly described as repetitive sequences. This includes the telomeres and centromeres from before, as well as many repeats of the same short sequences known as tandem repeats.

But the vast majority of repetitive sequences, and almost 45% of the entire genome, falls under transposons, or mobile genetic elements. While these sequences come in different flavours, they are united in their ability to independently “move” by copying and pasting themselves in different parts of the genome (over evolutionary time, for the most part).

While this might sound a bit creepy, they mostly don’t do very much. While some of these were born from transformed human genes, others are ancient remnants of virus DNA that got incorporated into our own genome long ago. Pretty spooky.

One particular transposon sequence, called Alu, is believed to have around 1,000,000 copies – that’s 10% of the whole genome!

In conclusion…

The debate on what the non-coding genome is made up of and what function it serves, if any, is ongoing in the genetics and molecular biology community. The human DNA code is still full of mystery, but there is a trend: what first appears to be junk is often discovered to have some sort of biological function after all.

So while the numbers are quite approximate and the scientific consensus on what is junk or not can change tomorrow, the key take away is that your DNA is messier than you might expect.