DNA and the Future of Digital Data Storage

DNA and Digital Data Storage

There is a growing shortage of available space for storing digital data worldwide. While this issue has existed for several years, most people rarely think about it. Not long ago, the amount of digital data you could store was limited by your computer’s hard drive. When you ran out of space, you either bought a new hard drive or burned data onto optical discs. If those filled up, you simply deleted old files to make room for new ones. But some people—especially companies whose business and value depend on their digital information—never delete data.

Times have changed. Technology has evolved. Now, instead of deleting, we move data to “the cloud.” The term “cloud” is quite abstract and doesn’t reflect any real physical phenomenon, but it’s convenient and has stuck. Where is the data actually stored? Most people don’t care, as long as they can access it whenever they want. Is it possible that we could eventually run out of space in the cloud? Few people consider this. As long as you pay your subscription, everything seems fine. Need more space? Just upgrade your plan and get even more storage for your information.

This convenience has made it hard for people to imagine that one day we might actually run out of digital storage space—just as it was once hard to imagine that Earth could run out of fresh water, even though its supply is replenished by the water cycle. But here’s reality: in 2018, Cape Town, South Africa, came dangerously close to running out of water. Similarly, we are rapidly approaching a shortage of digital storage space.

Data, Data Everywhere

The main reason for this looming shortage is the incredible rate at which we generate new data. Every day, 3.7 billion internet users create about 2.5 quintillion bytes of information. In fact, 90 percent of all digital data in existence was created in just the last two years. With the rise of smart devices connected to the Internet of Things, these numbers are set to grow even faster.

“When people talk about cloud storage, they often assume there’s an infinite amount of space,” says Hyunjun Park, CEO and co-founder of Catalog, a data storage company, in an interview with Digital Trends. “But the cloud is just another computer where your data is stored. People don’t realize that we’re generating so much digital data that the rate of production far outpaces our ability to store it all. Very soon, we’ll see a huge gap between the amount of valuable data and our ability to store it using traditional media.”

Cloud storage companies are constantly building new data centers or expanding existing ones, making it hard to predict exactly when we’ll run out of space. Still, Park estimates that by 2025, humanity could generate over 160 zettabytes of digital information (a zettabyte is a trillion gigabytes). How much of that can we actually store? About 12.5 percent, according to Park. Clearly, this is a problem that needs a solution.

Could DNA Be the Answer?

Park, Nathaniel Roquet, and their colleagues at MIT believe so. Together, they founded Catalog, a company that has developed a technology they believe could revolutionize how we store digital data in the near future. According to them, soon all the world’s digital data could fit in a space no larger than a wardrobe.

Catalog’s solution is to encode data into DNA. It sounds like something out of a Michael Crichton novel, but their scalable and affordable approach has already attracted $9 million in venture funding and support from leading professors at Stanford and Harvard.

“People often ask me whose DNA we use. They seem to think we take someone’s DNA and turn them into mutants or something,” Park laughs. But that’s not what Catalog does. The DNA they use for data encoding is a synthetic polymer—not of biological origin and not made from the base pairs that store genetic information. The sequence of zeros and ones is encoded into the polymer, but it’s not a code for anything living. Still, the end product is almost indistinguishable from the DNA found in living cells.

The idea of using DNA as an alternative medium for digital storage has been around for decades—since James Watson and Francis Crick first described DNA’s structure in 1953. But until recently, significant limitations prevented us from realizing DNA’s potential for data storage, let alone making it practical.

Traditionally, DNA data storage focused on synthesizing new DNA molecules, mapping sequences of bits to the four DNA bases, and producing enough molecules to represent all the numbers you want to store. The problem with this method is that it’s expensive and slow, with many limitations related to storing the data itself.

Catalog’s approach separates the synthesis of molecules from the encoding process. Essentially, the company first produces a large quantity of specific molecules (which is much cheaper), then encodes information into them by using the diversity of already-prepared molecules.

To illustrate, Catalog compares the old method to manufacturing custom hard drives with pre-recorded information. Recording new data would mean making a new hard drive from scratch. Their new approach is like mass-producing blank hard drives and writing new encoded information onto them as needed.

The Power of DNA Storage

The beauty of this technology is the sheer amount of data that can be stored in a tiny space. As a demonstration, Catalog encoded various science fiction books into DNA, including the entire “Hitchhiker’s Guide to the Galaxy” series. But that’s just the beginning.

“If you compare the numbers, the amount of data you can store with DNA is a million times greater than what solid-state drives offer. For example, using DNA storage, you could fit a million times more information onto a device the size of a regular flash drive than you could with a standard USB stick,” say the developers.

However, they note that comparing DNA storage to solid-state drives isn’t entirely accurate. DNA can store much more data in the same volume, but it doesn’t allow for instant access like a USB drive. Catalog’s technology transforms information into a solid physical pellet made of synthetic polymer.

To access the data, you take the encoded synthetic polymer pellet, rehydrate it with water, and then “read” it using a DNA sequencer. The process involves identifying the DNA base pairs, which are then used to calculate the zeros and ones that make up the information. This process can take several hours from start to finish.

Because of this, the technology is primarily aimed at the archival market, where fast access isn’t necessary. This usually means data that is rarely or never used after being stored, but is still extremely important to keep—like a warranty for your refrigerator, but on a corporate scale.

What does this mean for everyday users? As mentioned earlier, most of us don’t care where or how our data is stored—on solid-state drives, magnetic tape, or anything else—as long as we can access it when we need to. Because of the time required to retrieve information, it’s unlikely that services like Google Cloud or Yandex.Disk will ever store our data in giant DNA vats. If Catalog’s technology proves effective, it will likely find its niche in long-term data storage. For short-term storage, where hard drives and SSDs are currently used, we’ll have to rely on other methods.

Looking Ahead

Still, the possibilities are almost science-fictional. “Imagine having a pellet implanted under your skin that contains all your health information: your MRI scans, blood type, dental X-rays,” says Park. “You’d want all that data to be available to you at all times, but you wouldn’t want it stored in the cloud or on an unsecured hospital server. With your data in DNA form, you could physically control it, access it when needed, restrict access to others, and share it directly with your doctors.”

“Almost every modern hospital has a DNA sequencer. I’m not saying that’s our immediate goal, but in the future, it could become possible,” Park adds.

Currently, Catalog is working on experimental projects to demonstrate the effectiveness of their technology. “There are no insurmountable scientific challenges ahead of us; it’s more about optimizing the mechanical processes,” Park notes.

Park says he got involved in DNA data storage research simply because he thought it was a cool and innovative technological approach to a big problem. Now, he believes this technology could become one of the most important of our time.

Leave a Reply