Increasingly, civilization’s information is stored digitally, and that storage is abundant and growing. We don’t bother deleting those seven high-definition videos of the ceiling or 20 blurry photos of a table corner taken by our kid. There’s plenty of room on a smartphone or in the cloud, and we count on both increasing every year.
As we fluidly copy information from device to device, this situation seems durable. But that’s not necessarily true.
The amount of data we create is increasing rapidly. And if we (apocalyptically) lost the ability to produce digital storage devices—hard drives or magnetic tape, for example—our civilization’s collective digital record would begin to sprout holes within years. In decades, it’d become all but unreadable. Digital storage isn’t like books or stone tablets. It has a shorter expiration date. And, although we take storage for granted, it’s still expensive and energy hungry.
Which is why researchers are looking for new ways to archive information. And DNA, life’s very own “hard drive,” may be one solution. DNA offers incredibly dense data storage, and under the right conditions, it can keep information intact for millennia.
In recent years, scientists have advanced DNA data storage. They’ve shown how we can encode individual books, photographs, and even GIFs in DNA and then retrieve them. But there hasn’t been a scalable way to organize and retrieve large collections of DNA files. Until now, that is.
In a new Nature Materials paper, a team from MIT and Harvard’s Broad Institute describe a DNA-based storage system that allows them to search for and pull individual files—in this case images encoded in DNA. It’s a bit like thumbing through your file cabinet, reading the paper tabs to identify a folder, and then pulling the deed to your car from it. Only, obviously, the details are bit more complicated.
“We need new solutions for storing these massive amounts of data that the world is accumulating, especially the archival data,” said Mark Bathe, an MIT professor of biological engineering and senior author of the paper. “DNA is a thousandfold denser than even flash memory, and another property that’s interesting is that once you make the DNA polymer, it doesn’t consume any energy. You can write the DNA and then store it forever.”
How to Organize a DNA Storage System
How does one encode an image in a strand of DNA, anyway? It’s a fairly simple matter of translation.
Each pixel of a digital image is encoded in bits. These bits are represented by 1s and 0s. To convert it into DNA, scientists assign each of these bits to the DNA’s four base molecules, or nucleotides, adenine, cytosine, guanine, and thymine—usually referred to in shorthand by the letters A, C, G, and T. The DNA bases A and G, for example, could represent 1, and C and T could represent 0.
Next, researchers string together (or synthesize) a chain of DNA bases representing each and every bit of information in the original file. To retrieve the image, researchers reverse the process, reading the sequence of DNA bases (or sequencing it) and translating the data back into bits.
The standard retrieval process has a few drawbacks, however.
Researchers use a technique called a polymerase chain reaction (PCR) to pull files. Each strand of DNA includes an identifying sequence that matches a short sequence of nucleotides called a PCR primer. When the primer is added to the DNA solution, it bonds with matching DNA strands—the ones we want to read—and only those sequences are amplified (that is, copied for sequencing). The problem? Primers can interact with off-target sequences. Worse, the process uses enzymes that chew up all the DNA.
“You’re kind of burning the haystack to find the needle, because all the other DNA is not getting amplified and you’re basically throwing it away,” said Bathe.
To get around this, the Broad Institute team encapsulated the DNA strands in microscopic (6-micron) glass beads. They affixed short, single-stranded DNA labels to the surface of each bead. Like file names, the labels describe the bead’s contents. A tiger image might be labeled “orange,” “cat,” “wild.” A house cat might be labeled “orange,” “cat,” “domestic.” With just four labels per bead, you could uniquely label 1020 DNA files.
The team can retrieve specific files by adding complementary nucleotide sequences, or primers, corresponding to an individual file’s label. The primers contain fluorescent molecules, and when they link up with a complementary strand—that is, the searched-for label—they form a double helix and glow. Machines separate out the glowing beads, which are opened and the DNA inside sequenced. The rest of the DNA files remain untouched, left in peace to guard their information.
The best part of the method is its scalability. You could, in theory, have a huge DNA library stored in a test tube—Bathe notes a coffee mug of DNA could store all the world’s data—but without an easy way to search and retrieve the exact file you’re looking for, it’s worthless. With this method, everything can be retrieved.
George Church, a Harvard professor of genetics and well-known figure in the field of synthetic biology, called it a “giant leap” for the field.
“The rapid progress in writing, copying, reading, and low-energy archival data storage in DNA form has left poorly explored opportunities for precise retrieval of data files from huge…databases,” he said. “The new study spectacularly addresses this using a completely independent outer layer of DNA and leveraging different properties of DNA (hybridization rather than sequencing), and moreover, using existing instruments and chemistries.”
This Isn’t Coming For Your Computer
To be clear, all DNA data storage, including the work outlined in this study, remains firmly in the research phase. Don’t expect DNA hard drives for your laptop anytime soon.
Synthesizing DNA is still extremely expensive. It’d cost something like $1 trillion dollars to write a petabyte of data in DNA. To match magnetic tape, a common method of archival data storage, Bathe estimates synthesis costs would have to fall six orders of magnitude. Also, this isn’t the speediest technique (to put it mildly).
The cost of DNA synthesis will fall—the technology is being advanced in other areas as well—and with more work, the speed will improve. But the latter may be beside the point. That is, if we’re mainly concerned with backing up essential data for the long term with minimal energy requirements and no need to regularly access it, then speed is less important than fidelity, data density, and durability.
DNA already stores the living world’s information, now, it seems, it can do the same for all things digital too.
Image Credit: Courtesy of the researchers (via MIT News).