Thanks to the cloud, it’s hard to imagine that we’ll ever run out of data storage. But by 2040, we may be swarmed by three septillion bits of data, and Earth will run out of chip-grade silicon. According to one estimate, current data farms will last a century, if that.
For the “big data” revolution to continue, we need to radically rethink our hard drives. Thanks to evolution, we already have a clue.
Our bodies are jam-packed with data, tightly compacted inside microscopic structures within every cell. Take DNA: with just four letters we’re able to generate every single molecular process that keeps us running. That sort of combinatorial complexity is still unheard of in silicon-based data storage in computer chips.
Add this to the fact that DNA can be dehydrated and kept intact for eons—500,000 years and counting—and it’s no surprise that scientists have been exploiting its properties to encode information. To famed synthetic biologist Dr. George Church, looking to biology is a no-brainer: even the simple bacteria E. Coli has a data storage density of 1019 bits per cubic centimeter. Translation? Just a single cube of DNA measuring one meter each side can meet all of the world’s current data storage needs.
But what if we can do even better?
DNA is essentially a biological index, eventually expanded to generate the entire slew of biochemical players that underlie day-to-day operations in the body—the metabolome. Compared to DNA, which only has four “bits,” the metabolome is a massive ecosystem of hundreds of thousands of molecules. In other words, it’s a ready-made goldmine, ready to be filled with information of our own choosing—if we can tap into it.
This month, a team led by Dr. Jacob Rosenstein at Brown University provided one of the first proofs-of-concept that DNA isn’t the only bio-player in the data-storage world. It’s not even the most powerful. Because DNA is relatively easy to read and write, it’s the low-hanging, practical bio-storage medium currently with the most steam. It’s time to look broader.
“It’s not hard to recognize that cells and organisms use small molecules to transmit information, but it can be harder to generalize and quantify,” said study author Dr. Eamonn Kennedy. “We wanted to demonstrate how a metabolome can encode precise digital information.”
Meet Your Metabolome
If DNA is a best-selling novel, then the metabolome is its faithful movie adaptation.
Ultimately, the metabolome receives its instructions from DNA. But because of its complexity in numbers, structures, and variety, the metabolome provides an entire ecosystem of information carriers with much more information density than the original content. That’s the source of the metabolome’s superior data-storing power: its diversity of molecules that coexist and interact in mind-boggling combinations.
“The theoretical limit for molecular information [using the metabolome] is two orders of magnitude denser by mass than DNA,” explained Rosenstein in an earlier paper. “We are optimistic that many new classes of molecular storage media will be developed.”
The problem, of course, is how to wrangle messy biological ingredients to encode digital information. DNA is relatively simple—it’s not hard to map a 0 and 1 system to an A, T, C, and G one. But where to start with a slew of biomolecules?
The Metabolome Hard drive
The team began by making soups of metabolites—sugars, amino acids, vitamins, and other small molecules that we commonly use to digest food or keep our bodies operational. In all, they tapped 36 common chemicals, and used their presence—or absence—to encode 1s or 0s. By creating a chemical-digital bridge, the team hoped to eventually broaden the scope of the metabolome hard device to anything we can currently store in silicon chips. The total number of different metabolite chemicals in a particular mixture determined how many bits can be stored, and the study mainly focused on either 6 or 12 bits as a first demo.
To precisely move the liquid components onto a steel plate for storage and reading, the team used energy from ultrasound waves (no kidding) to push tiny amounts of fluid to their target point. Dot by dot, the ultrasound robot generated plates neatly lined with columns and rows of metabolic fluids with different compositions.
For their initial proof of concept, the team focused on writing and reading simple digital images in the language of the metabolome. Need a visual? Picture Minecraft-like blocky black-and-white clip art. To encode the inputs, they constructed a dictionary. For example, a particular spot on the steel plate was designated to present the binary code 0101. In practice, this meant that the spot had two particular types of metabolites at that location. An adjacent spot corresponded to a different binary code, with its own unique mix of metabolites.
Dictionary in hand, the team then encoded an anchor, an ibex—a horned elk-like animal—and an Egyptian cat onto multiple steel plates. For example, to “write” the ibex image into the metabolome, the team used a mixture of six different molecules, with the presence or absence of each encoding 1 or 0. In total the project produced thousands of liquid dots on multiple plates, with each providing enough binary storage space to encode images over 17,000 pixels, which added up to more than 100,000 bits of information.
Retrieving that data took another machine: the mass spectrometer, a mature technology routinely used in chemistry to test for the presence of a chemical and its amount. Finally, using an algorithm that separates signal from noise, the team was able to read each 2kb file, identify the molecules, and reconstruct the image with roughly 99 percent accuracy.
To be fair, 2kb is pretty small. But as a first proof of concept, the study suggests small biomolecules have the capacity to work together for data storage.
To put file size into perspective, so far DNA storage has reached about 214 petabytes per gram, but the theoretical limit is much higher. But thanks to its chemical diversity, in theory a metabolome can easily match this level of storage using libraries of small molecules already available, the team said.
In other words, the system is massively scalable. It’s true that upping the diversity of molecules, which increases bits, will also make it harder to precisely read the data. However, by adopting error-correcting code into the system, and with the help of increasingly sophisticated data technologies for profiling molecules—think artificial neural networks, genetic algorithms—the prospect seems pretty bright. The current number of identified metabolome molecules is 100,000—a quantum leap from DNA’s four letters. Even if only a fraction of these are stable and readable, we’re looking at a major storage boost.
More optimization will also boost input-retrieval speeds. The study was able to write data at about fivebits per second, and read it roughly two times faster, but there’s “significant room to improve,” the authors said.
That said, the system has a massive built-in bug. Because metabolites are small chemicals that can interact with each other, unplanned chemical reactions could erode encoding precision. But that reactivity could also be turned into a powerful feature: it may allow scientists to overwrite data, or transform data in predictable ways. Although still early, it “hints at possibilities for synthetic metabolomic computation,” the authors said.
Fundamentally, the study hopes to encourage people to think more broadly and creatively about the range of possible biological hard drives.
“Research like this challenges what people see as being possible in molecular data systems,” said study author Dr. Brenda Rubenstein. “DNA is not the only molecule that can be used to store and process information. It’s exciting to recognize that there are other possibilities out there with great potential.”