homehome Home chatchat Notifications


Researchers encode data in DNA hundreds of times faster than before — with panda pics

Two images were stored in and retrieved from DNA sequences faster than ever before. This could be a game-changer for our data storage.

Mihai Andrei
October 24, 2024 @ 6:23 pm

share Share

AI image of DNA used for storage
DNA could soon become a reliable medium of storage. AI-generated image.

DNA can hold a staggering amount of information. Not only is it the blueprint for all life on Earth, but a single gram of DNA can store the equivalent of 215 million gigabytes of data. That’s enough to hold every digital book, song, and movie ever created. Gram for gram, DNA can store up to a billion times more data than silicon-based storage.

The traditional method of storing data in DNA involves encoding binary information (the ones and zeros of computing) into sequences of nucleotide bases — adenine (A), thymine (T), guanine (G) and cytosine (C) — and then synthesizing these sequences chemically. This method is promising but high costs and slow data writing speeds hamper it. The new study addresses these challenges by introducing a method that encodes data without synthesizing new DNA sequences.

The new method sidesteps these limitations.

In this new system, the research team, led by Cheng Zhang developed a method that uses epigenetic modifications to encode data. Epigenetic modifications involve chemical changes to DNA that do not alter its sequence but can influence its function. One common type of epigenetic modification is DNA methylation, where methyl groups are added to cytosine bases in the DNA sequence.

“It’s encouraging to see that epigenetic principles from biochemistry textbooks and taught in my classroom can be applied seamlessly to DNA data storage applications to solve some of the unmet challenges in this field,” says corresponding author Hao Yan.

How it all works

The team’s approach essentially “prints” data onto DNA using these methylation marks as binary data bits, or “epi-bits.” By using a library of prefabricated DNA templates and short DNA strands known as bricks, the researchers could guide where methyl groups are placed on the DNA, allowing them to encode complex information without having to synthesize new DNA molecules from scratch.

One of the most remarkable features of this new approach is its ability to write data in parallel. Traditional DNA synthesis is a serial process — each nucleotide must be added one at a time, which is time-consuming and costly. However, the new system allows the researchers to add multiple epi-bits of information simultaneously, increasing the speed and scalability of data storage.

Let’s say you’re writing a letter by hand. You’re writing all the letters one by one, which is not very efficient. But, when you print something, you print an entire row, which is much faster.

“This new approach demonstrates how one can harness molecular mechanisms for innovative data solutions, bridging the fields of biology and digital information,” says Laura Na Liu, a co-author of the new study.

Coding panda pics into DNA

Images of tigers showing the results of different DNA data storage methods
Recovered tiger images from samples 1 to 4 with stepwise improved writing-reading pipelines. 

The team tested their approach by storing an image of a panda and a rubbing in the shape of a tiger from ancient China. They then retrieved them with a DNA sequencer.

In their experiments, the researchers stored approximately 275,000 bits of information using their new system (about a third of a megabyte). They achieved this by employing a set of 700 DNA “movable types” (i.e., pre-made short DNA sequences) and five universal DNA templates. This allowed them to write 350 bits of data in a single reaction, a significant improvement over traditional methods. The approach was also reliable, having high fidelity and minimal error rates (less than 3%).

The DNA coding scheme for the image of a panda
 Compression and error correction coding scheme for panda image (i), and a schematic of the retrieved epi-bits on sequencing reads along with the restored image (ii).

To ensure that the data stored using epigenetic modifications could be accurately read, the researchers used high-throughput nanopore sequencing, a technology that reads DNA sequences by passing them through a tiny pore and detecting changes in electrical current.

The research also demonstrated a novel aspect of their technology: its accessibility. They conducted a pilot experiment called “iDNAdrive,” where 60 student volunteers with no professional biolab experience successfully encoded their own data into DNA using a simple kit. This shows that their system is not only scalable but also user-friendly.

This marks a significant departure from current DNA data storage methods, which could only be done in a lab before. In this distributed system, users could “write” data to DNA in their own homes and then retrieve it later through sequencing.

Big promise, big challenges

This research highlights the incredible potential of DNA as a medium for storing vast amounts of data in a compact, stable, and durable form. The innovative use of epigenetic modifications to encode data provides a new way to overcome the limitations of traditional DNA synthesis methods.

DNA is much more stable than silicon and other traditional storage media. Properly stored, DNA can last for thousands of years, making it ideal for archival purposes, such as preserving cultural artifacts, historical records, or scientific data. This method’s potential for distributed data storage could revolutionize personal data privacy and security. Instead of relying on cloud storage or data centers, individuals could store their most sensitive information in DNA, which could be kept in a secure location and accessed only when needed.

However, there are also enormous challenges ahead. For starters, only very small amounts of information were stored, and the error rates, while relatively low (<3%), are not acceptable for data we work with routinely.

Another challenge is the speed of data retrieval. Although nanopore sequencing allows for high-throughput reading of DNA, it is still slower than the reading speeds of conventional digital storage devices. Advances in sequencing technology will be crucial to making DNA data storage competitive with silicon-based systems.

The study “Parallel molecular data storage by printing epigenetic bits on DNA” was published in Nature.

share Share

This 5,500-year-old Kish tablet is the oldest written document

Beer, goats, and grains: here's what the oldest document reveals.

A Huge, Lazy Black Hole Is Redefining the Early Universe

Astronomers using the James Webb Space Telescope have discovered a massive, dormant black hole from just 800 million years after the Big Bang.

Did Columbus Bring Syphilis to Europe? Ancient DNA Suggests So

A new study pinpoints the origin of the STD to South America.

The Magnetic North Pole Has Shifted Again. Here’s Why It Matters

The magnetic North pole is now closer to Siberia than it is to Canada, and scientists aren't sure why.

For better or worse, machine learning is shaping biology research

Machine learning tools can increase the pace of biology research and open the door to new research questions, but the benefits don’t come without risks.

This Babylonian Student's 4,000-Year-Old Math Blunder Is Still Relatable Today

More than memorializing a math mistake, stone tablets show just how advanced the Babylonians were in their time.

Sixty Years Ago, We Nearly Wiped Out Bed Bugs. Then, They Started Changing

Driven to the brink of extinction, bed bugs adapted—and now pesticides are almost useless against them.

LG’s $60,000 Transparent TV Is So Luxe It’s Practically Invisible

This TV screen vanishes at the push of a button.

Couple Finds Giant Teeth in Backyard Belonging to 13,000-year-old Mastodon

A New York couple stumble upon an ancient mastodon fossil beneath their lawn.

Worms and Dogs Thrive in Chernobyl’s Radioactive Zone — and Scientists are Intrigued

In the Chernobyl Exclusion Zone, worms show no genetic damage despite living in highly radioactive soil, and free-ranging dogs persist despite contamination.