A team of scientists led by a former University of British Columbia (UBC) postdoctoral researcher in medical genetics discovered over ten times the number of RNA viruses previously thought to exist, including many novel coronavirus species.
By re-analyzing all publicly accessible RNA sequencing data, the researchers were able to make the finding. They established a planetary-scale database of RNA viruses that might assist in quickly identifying viral spread into people, as well as viruses that damage cattle, crops, and endangered species.
The Serratus Project partnership, led by Dr. Artem Babaian, has published astounding results in the prominent scientific journal Nature.
The Serratus Project was able to develop a “ridiculously powerful” supercomputer by collaborating with the Cloud Innovation Centre (CIC), a public/private partnership between UBC and Amazon Web Services (AWS), according to Dr. Babaian.
The supercomputer examined 20 million gigabytes of publicly available gene sequence data from 5.7 million biological samples worldwide to search for a specific gene that indicated the presence of an RNA virus. The samples, which range from ice-core samples to animal faeces, have been gathered and openly disseminated within the global study community for the past 13 years.
The team working on the Serratus Project discovered 132,000 RNA viruses, compared to just 15,000 previously known. There were nine novel coronaviruses in the novel species discovered.
According to Dr. Babaian estimates, without CIC, the traditional supercomputer would have required more than a year and millions of dollars to perform the 2,000 years of CPU time necessary for this analysis. Serratus completed it in 11 days at a total cost of $24,000.
“We’re entering a new era of understanding the genetic and spatial diversity of viruses in nature and how a wide variety of animals interface with these viruses. The hope is we’re not caught off guard if something like SARS-CoV-2—the novel coronavirus that causes COVID-19— emerges again. These viruses can be recognized more easily, and their natural reservoirs can be found faster. The real goal is these infections are recognized so early that they never become pandemics,” said Dr. Babaian.
“If a patient presents with a fever of unknown origin, once that blood is sequenced, you can now connect that unknown virus in the human to a way bigger database of existing viruses. If a patient, for example, presents with a viral infection of unknown origin in St. Louis, you can now search through the database in about two minutes and connect that virus to, say, a camel in sub-Saharan Africa sampled in 2012.”
“The real goal is these infections are recognized so early that they never become pandemics.” – Dr. Artem Babaian
“While the public cloud as we know it has been around for 15 years, the last few years of innovation at Amazon Web Services have really made genomics research possible in a new way,” said Coral Kennett, the head of the Centre for Amazon Web Services. “We were able to give Artem access to compute power for pennies a query. We highly encourage the research community to submit their projects and ideas to the Cloud Innovation Centre so that more innovation comes to light benefitting the community.”
Story Source: Robert C. Edgar et al, Petabase-scale sequence alignment catalyses viral discovery, Nature (2022). DOI: 10.1038/s41586-021-04332-2
Image Source: Map of global RNA sequencing data that Dr. Babaian and his team analyzed to identify new RNA viruses [Source: Serratus Project]