Microsoft successfully automates DNA-based data storage system
Researchers from Microsoft and the University of Washington having been working since 2016 on developing one of the first complete DNA-based data storage system that can actually shrink the space needed to store digital data. The storage system will have random-access readability and error correction protocols that would be required for real-world applications.
Microsoft on Thursday finally announced that a team of researchers has successfully executed the first fully automated system where data can be stored and retrieved in manufactured DNA and then again converted back into digital data.
The researchers published their proof-of-concept system in a new paper in Nature Scientific Reports journal on March 21.
In a simple proof-of-concept test, the research team was able to successfully encode the word “hello” in snippets of fabricated DNA and convert it back to digital data using a fully automated end-to-end system. It took the researchers 21 hours to convert five bytes of data.
“We have conviction that DNA molecules are good candidates for data storage. But we are, at heart, computer architects. We really want to figure out what a future computer could look like,” Luis Ceze, a professor at UW’s Paul G. Allen School of Computer Science and Engineering said. “What’s exciting for us here is that it’s one step toward showing a computer system that has a molecular component and an electronic component.”
The method for DNA data storage is similar to the way the DNA in our cells encodes genetic information. The encoding system uses software that converts zeroes and ones that make up a digital file using the four basic building blocks of DNA – adenine, guanine, cytosine and thymine. The “letters” of DNA code — adenine, thymine, cytosine, and guanine, or A-T-C-G — stood in for the 1’s and 0’s of a computer’s binary code that can be read by digital machines.
For instance, “Hello” could be coded into the chemical string TCAACATGATGAGTA. To do so, the device first encoded the bits (1’s and 0’s) into DNA sequences (A’s, C’s, T’s, G’s), synthesized the DNA, and then stored it as a liquid. The stored DNA was then read by a DNA sequencer and finally, the sequences were translated back into bits by the decoding software.
“Our ultimate goal is to put a system into production that, to the end user, looks very much like any other cloud storage service — bits are sent to a datacenter and stored there and then they just appear when the customer wants them,” said principal researcher Karin Strauss, a UW affiliate associate professor in the Paul G. Allen School of Computer Science and Engineering and a senior researcher at Microsoft. “To do that, we needed to prove that this is practical from an automation perspective.”
Until now, the system has managed to store one gigabyte of data in DNA, besting their previous world record of 200 MB. The stored data includes cat photographs, great literary works, pop videos as well as archival recordings in DNA, which were retrieved without errors, the researchers said.
However, the drawback is that it’s expensive and extremely slow to write data to DNA, because of the slow chemical reactions involved in writing DNA and then getting it back from DNA which involves sequencing and decoding files back to 0s and 1s.
“Information is stored in synthetic DNA molecules created in a lab, not DNA from humans or other living things, and can be encrypted before it is sent to the system. While sophisticated machines such as synthesizers and sequencers already perform key parts of the process, many of the intermediate steps until now have required manual labor in the research lab. But that wouldn’t be viable in a commercial setting, said Chris Takahashi, senior research scientist at the UW’s Paul G. Allen School of Computer Science & Engineering.
“You can’t have a bunch of people running around a datacenter with pipettes — it’s too prone to human error, it’s too costly and the footprint would be too large,” Takahashi added.
Microsoft believes that synthetic DNA could be the next big step forward in long-term data storage with ease. “We are definitely seeing a new kind of computer system being born here where you are using molecules to store data and electronics for control and processing. Putting them together holds some really interesting possibilities for the future,” said Ceze.