As the integration of genome sequencing in scientific research, government policy, and personalized medicine continues to grow, the primary challenge for researchers has shifted from generating raw data to analyzing vast datasets. It can take a long time and require a lot of processing to examine large amounts of genomic sequence data. However, the use of graphics processing units (GPUs) offers the potential to accelerate genomic workflows significantly. In this benchmark study, the researchers from Deloitte Consulting LLP evaluate the performance of NVIDIA Parabricks, a GPU-accelerated software suite, on various cloud platforms and an NVIDIA DGX cluster.
Cloud Computing and GPU Acceleration
The development of cloud computing has addressed the data storage, sharing, and analysis issues that come with large-scale genome sequencing initiatives. In the realm of genomic research, cloud platforms such as Amazon Web Services (AWS) and Google Cloud Platform (GCP) offer researchers a formidable solution to combat data silos, extended download durations, and sluggish workflow runtimes. Through the strategic utilization of these platforms, researchers can streamline data management processes, minimize download times, and accelerate workflow runtimes, ultimately enhancing productivity and advancing genomic insights. By harnessing the scalability, on-demand resource allocation, and advanced computing infrastructure of these cloud platforms, researchers can enhance the efficiency and agility of their genomic research endeavors. The migration to the cloud has facilitated seamless data management and improved accessibility for genomic research.
Graphics Processing Units (GPUs) have revolutionized data processing and analysis in various fields, including genomics. The parallel processing power of GPUs allows for rapid analysis of large genomic datasets, enabling researchers to achieve unprecedented scalability. GPU acceleration has the potential to unlock the computational power required for whole-genome biosurveillance and personalized medicine.
Benchmarking GPU-Accelerated Workflows
Previous studies have benchmarked GPU-accelerated algorithms utilizing on-premises computing clusters, but there is limited research on their performance in the cloud. NVIDIA Parabricks, a GPU-accelerated software suite, has been benchmarked on AWS and achieved significant speedups for GATK HaplotypeCaller. However, there is a need for comprehensive benchmarking across a range of germline and somatic variant callers, as well as different cloud platforms and hardware configurations.
In this study, the researchers evaluated two germline variant callers and four somatic variant callers using both CPU and GPU algorithms implemented with NVIDIA Parabricks, and the performance was compared on AWS, GCP, and an NVIDIA DGX cluster.
For germline variant callers, the researchers observed significant speedups with GPU acceleration, achieving up to 65x acceleration compared to CPU runs. The runtimes of HaplotypeCaller were reduced from 36 hours to 33 minutes on AWS, 35 minutes on GCP, and 24 minutes on the NVIDIA DGX. GPU-accelerated germline callers resulted in cost savings compared to CPU runs on cloud platforms.
However, somatic variant callers exhibited more variation in the number of GPUs required for optimal performance. Some somatic callers showed limited scalability with additional GPUs, emphasizing the need for algorithmic benchmarking before deploying large-scale projects. Although GPU runs were faster for some somatic callers, the increased GPU cost outweighed the performance gains, making them more expensive than CPU runs.
This study demonstrates the potential of GPU acceleration in genomic workflows. Germline variant callers showed linear scalability with the number of GPUs and achieved significant acceleration, leading to cost savings on cloud platforms. However, somatic variant callers exhibited variations in performance with different numbers of GPUs, underscoring the need for careful optimization and benchmarking.
By leveraging GPU-accelerated workflows, researchers can accelerate genomic analysis and process more samples within a fixed budget. This technology brings us closer to achieving significant advancements in biosurveillance and personalized medicine. Future studies should focus on optimizing algorithms and hardware configurations for different genomic workflows and platforms to further enhance the efficiency and cost-effectiveness of GPU-accelerated genomics. Hence, GPUs offer a promising solution to computational challenges in large-scale genomics data analysis.
Article Source: Reference Paper
Dr. Tamanna Anwar is a Scientist and Co-founder of the Centre of Bioinformatics Research and Technology (CBIRT). She is a passionate bioinformatics scientist and a visionary entrepreneur. Dr. Tamanna has worked as a Young Scientist at Jawaharlal Nehru University, New Delhi. She has also worked as a Postdoctoral Fellow at the University of Saskatchewan, Canada. She has several scientific research publications in high-impact research journals. Her latest endeavor is the development of a platform that acts as a one-stop solution for all bioinformatics related information as well as developing a bioinformatics news portal to report cutting-edge bioinformatics breakthroughs.