DNA is ridiculously good at storing information. One milliliter droplet of DNA can theoretically store as much information as two Walmarts full of data servers. What’s more, DNA can be stored at room temperature for hundreds of thousands of years. If your gears are turning right now, you’re not alone.
However, using DNA to store information is not at all as straightforward as storing it on a flash drive. In fact, it can be a nightmare to encode and decode information from the blueprint of life — but science is making progress in strides.
In a new study, researchers at the University of Texas have employed a new technique for storing and reading information encoded in the iconic double-helix “twisted ladder”.
The researchers demonstrated their novel technique by encoding the entire book of “The Wizard of Oz”, translated into Esperanto, with unprecedented accuracy and efficiency.
“The key breakthrough is an encoding algorithm that allows accurate retrieval of the information even when the DNA strands are partially damaged during storage,” said Ilya Finkelstein, an associate professor of molecular biosciences and one of the authors of the study.
DNA: 5 million times more efficient than any storage medium employed today
Every cell in our bodies and even instincts are encoded in base sequences of adenine (A), thymine (T), guanine (G), and cytosine (C) — DNA’s four nucleotide bases. Ever since DNA was first discovered in the 1950s by James Watson and Francis Crick (and the largely uncredited Rosalind Franklin) scientists quickly realized that huge quantities of data could be stored at high density in only a few molecules.
Just one gram of DNA is enough to store the entirety of all human knowledge, which is why some are keen on using the blueprint of life as the ultimate time capsule.
Additionally, DNA can be stable for a long time as a recent study showed, when researchers recovered DNA from 430,000-year-old human ancestor found in a cave in Spain.
For years, scientists have been storing all sorts of information in DNA, particularly during the previous decade. In 2017, researchers at the New York Genome Center (NYGC) stored a full computer operating system, an 1895 French film, “Arrival of a train at La Ciotat,” a $50 Amazon gift card, a computer virus, a Pioneer plaque and a 1948 study by information theorist Claude Shannon into 72,000 DNA strands each 200 bases long.
However, we’re still a long way from using DNA as a reliable storage medium. For one, synthesizing and reading DNA is prohibitively expensive.
The biggest impediment, however, is the fact that DNA is highly prone to errors.
Unlike malfunctioning computer code, which tends to show up as blanks, errors in DNA sequences appear as insertions or deletions. This can cause a huge predicament since such errors shift the whole sequence, with no blank spaces to alert us.
In order to account for inherent errors in DNA, researchers had to repeat a piece of information 10 to 15 times. These repetitions can be compared to track insertions or deletions.
But due to the way the team at the University of Texas chose to store information, there is no need for repetitions.
“We found a way to build the information more like a lattice,” said Stephen Jones, a research scientist who collaborated on the project with Finkelstein. “Each piece of information reinforces other pieces of information. That way, it only needs to be read once.”
To demonstrate the reliability of their method, Finkelstein’s team of researchers encoded the Wizard of Oz into DNA, which they then subjected to high temperature and extreme humidity.
Naturally, the DNA strands became damaged, but all the information was read successfully. This marks a huge leap in the long road to DNA storage of information.
“We tried to tackle as many problems with the process as we could at the same time,” said John Hawkins, co-author of the new study and a Ph.D. alumnus of the Oden Institute for Computational Engineering and Sciences at the University of Texas.
“What we ended up with is pretty remarkable.”
The method was described in the Proceedings of the National Academy of Sciences.