Genomic analysis plays a vital role in various areas, such as disease detection, drug development, and genetic disease identification. With the exponential growth of genomic databases and the need for efficient data processing, GPUs have emerged as powerful tools for accelerating genomic analysis. In this article, we examine the use of GPUs for genomic research and explore the Genomics-GPU benchmark suite, which consists of 10 widely-used genomic analysis applications.

The Importance of Genomics Research

To support research in this area, it is essential to have benchmark suites that consist of representative and diverse applications running on GPUs. In response to this need, a benchmark suite called Genomics-GPU has been created. This suite comprises ten widely-used genomic analysis applications, covering various tasks such as genome comparison, matching, and clustering for DNAs and RNAs. Additionally, these applications have been adapted to exploit CUDA Dynamic Parallelism (CDP), an advanced feature that supports dynamic GPU programming, to enhance performance further.

Genomics-GPU Benchmark Suite

The Genomics-GPU benchmark suite serves as a foundation for algorithm optimization and facilitates the development of GPU architectures for genomics analysis. It enables researchers to evaluate and compare the performance of different GPU-based solutions. By providing representative applications and input datasets of various sizes, the benchmark suite ensures that evaluations are conducted comprehensively and rigorously.

When it comes to the architecture of GPUs, they are specifically designed for parallel computing and offer exceptional computational power. Modern GPUs consist of multiple Streaming Multiprocessors (SMs) and various layers of memory partitions. Each SM contains control units, registers, execution pipelines, scratchpad memory, and caches. GPUs leverage the concept of warps, which are fixed-size SIMD (Single Instruction, Multiple Data) batches of threads, to achieve high throughput. The parallel execution of multiple warps on a single GPU core enables significant improvements in system throughput.

Leveraging CUDA Dynamic Parallelism (CDP)

One notable feature that has been introduced in the CUDA programming model is CUDA Dynamic Parallelism (CDP). CDP allows GPU threads to be launched dynamically, simultaneously, and independently within a parent kernel or block. This feature provides more flexibility and control over the execution of GPU programs. It enables nested launches of kernels and allows for sharing resources such as shared memory and global memory between parent and child kernels. However, it is essential to note that CDP implementations come with some overhead, including API calls, kernel parameter parsing, device runtime setups, and the management of child kernels.

Performance Comparison with CPUs

To compare the performance of CPUs and GPUs in genomic analysis, several popular algorithms such as Smith-Waterman (SW), Needleman-Wunsch (NW), and Center Star Algorithm (STAR) were evaluated. The results demonstrated that GPUs may give a large performance advantage over CPUs, with up to 20-fold speedups. Furthermore, the utilization of CDP can further enhance performance, reducing execution times by more than half in some cases.

Performance Analysis and Bottlenecks

In terms of performance analysis, various factors can impact the overall performance of genomic analysis applications. One crucial aspect is the kernel execution pattern, which determines the number of kernel function calls made by the CPU to the GPU. The results indicate that most applications exhibit a higher number of kernel function calls compared to PCI transactions, highlighting the computational intensity of these applications. However, certain applications, such as GASAL2, show a larger number of PCI transactions, indicating potential communication bottlenecks.

Identifying performance bottlenecks is essential for optimizing the execution of genomic analysis applications. A rigorous examination of pipeline stalls utilizing a cutting-edge GPU simulator called Accel-Sim shows that lengthy memory delays are the major source of pipeline stalls, accounting for up to 95% of all pipeline stalls. Control hazards and pipeline idling are other significant causes. Understanding these bottlenecks helps researchers prioritize optimizations and make informed decisions to improve the efficiency of genomic analysis on GPUs.


The use of GPUs for genomic analysis has demonstrated significant advantages over traditional CPU-based approaches, leading to accelerated data processing and analysis. The Genomics-GPU benchmark package is a significant resource for academics looking to test and enhance their algorithms, investigate the possibilities of GPU architectures, and promote genomics innovation. By leveraging the computational power of GPUs and advancements like CUDA Dynamic Parallelism, genomics research can continue to advance our understanding of genetic information, leading to improved healthcare outcomes and advancements in personalized medicine.

Article Source: The research work was presented as a conference paper at IEEE International Symposium on Performance Analysis of Systems and Software, 2023. The Genomics-GPU is open source and available on GitHub

Learn More:

Website | + posts

Dr. Tamanna Anwar is a Scientist and Co-founder of the Centre of Bioinformatics Research and Technology (CBIRT). She is a passionate bioinformatics scientist and a visionary entrepreneur. Dr. Tamanna has worked as a Young Scientist at Jawaharlal Nehru University, New Delhi. She has also worked as a Postdoctoral Fellow at the University of Saskatchewan, Canada. She has several scientific research publications in high-impact research journals. Her latest endeavor is the development of a platform that acts as a one-stop solution for all bioinformatics related information as well as developing a bioinformatics news portal to report cutting-edge bioinformatics breakthroughs.


Please enter your comment!
Please enter your name here