Home Bioinformatics KCL Researchers Develop DNAscan2: A Highly Flexible, End-to-End Pipeline for NGS Data...

KCL Researchers Develop DNAscan2: A Highly Flexible, End-to-End Pipeline for NGS Data Analysis

KCL Researchers Develop DNAscan2: A Highly Flexible, End-to-End Pipeline for NGS Data Analysis

Next-generation sequencing (NGS) is becoming increasingly accessible and affordable, illustrating its growing importance and adoption in the field of clinical and biomedical genetics. To keep up with this growth, King’s College London researchers have developed DNAscan2, a novel, user-friendly, end-to-end pipeline for analyzing NGS data. The software’s versatility, scalability, and the fact that it is open-source make it a pioneering tool for a wide range of users, including clinical geneticists focusing on disease diagnostics and biomedical researchers working on extensive genomic studies.

Versatility, Scalability, and User-friendliness of DNAscan2

As versatile and scalable software, DNAscan2 is designed to address the unique requirements of a wide range of users. It can also provide the whole analysis process of NGS data, something that other NGS pipelines, which focus on only specific aspects, are limited to (non-end-to-end pipelines). DNAscan2 can, thus, detect multiple variant types, including single nucleotide variations (SNVs), small indels, transposable elements, short tandem repeats, and other large structural variants. What is fascinating is that it can also cover all standard steps of NGS data analysis, comprising quality control and normalization of raw data, genome alignment for variant calling, annotation, and even the generation of reports for result interpretation and prioritization.

DNAscan2 is also highly user-friendly, which means that even users with highly variable informatics skills, computing facilities, and application purposes can comfortably use this software to process, analyze, and interpret NGS data accurately. For instance, for users with limited RAM and/or CPU time constraints, DNAscan2 can automatically implement a fast mode that does not require high computational demand or computationally intensive steps. This software presents, therefore, a crucial step forward in bioinformatics research by allowing researchers, even with limited computational resources, to perform NGS data analysis regardless of their application purpose. 

New and Upgraded Features of DNAscan2 

DNAscan2 presents significant improvements compared to its predecessor, DNAscan. One notable improvement that DNAscan2 has is that it implements a single protocol to automatically tailor the computational requirements according to the type of variants the user is interested in analyzing by default. 

There are also improvements in the analysis of each variant type, as follows:

  1. SNV and Indel Calling

As opposed to the previous use of Freebayes and GATK Haplotype Caller, DNAscan2 uses the Strelka2 small variant caller for both DNV and indel calling. While no major changes in performance were observed for SNVs, there was a significant improvement in precision and F-measure for indel detection. This was demonstrated by both NA12878 WES and HG002 WGS samples for both standard cells and medically relevant genetic variants in challenging regions. 

  1. Structural Variant Calling

DNAscan2 implements a new addition, Delly, into its analysis pipeline. Delly is a software tool that can call inversion and deletion variants, tandem duplications, and translocation events. Delly demonstrates a much higher F-measure for small and medium deletions. The addition of Delly resulted in a much higher F-measure for small and medium deletions and a higher precision for small haplotype-resolved inversion calls. Additionally, it was found that Delly was exclusively responsible for almost all true positive and inversion calls, if not shared by Delly and Manta. However, it is important to note that the improved performance comes at the cost of an increased runtime of 24–30 hours longer than previously. Also, the “fast mode,” as highlighted previously, cannot be used for structural variant calling with the addition of Delly. Despite these limitations, the significant improvement in the accuracy and sensitivity of SV calling is worthy of note.

  1. Transposable Element and Short Tandem Repeat Discovery 

Implementing new state-of-the-art tools substantially improved the detection of mobile element insertions (Alu, SVA, and LINE1). They can be identified and genotyped via MELT and a genome-wide non-reference short tandem repeat loci profile containing information about the repeats (e.g., repeat size, motif composition). Despite the improvement in detecting mobile element insertions, short tandem repeat genotyping cannot be performed in the fast mode and, therefore, is inaccessible to users with limited computational resources.

  1. Variant Annotation and Report Generation 

DNAscans2 extends the range of variants that can be annotated, now including structural and mobile elements using the AnnotSV tool. Also, users are given the flexibility to define their ANNOVAR databases to annotate known and novel repeat expansions. DNAscan2 also provides an HTML report of variants annotated with AnotSV and a generalized annotation report with information on all identified variants for the user’s convenience.

Implementing DNAscan2 and its Computational Requirements

DNAscan2 is written in Python3 and is open-source, available to download for free from GitHub. The database dependencies and the software can be installed manually, with a bash helper script, or through a graphical user interface (GUI). Interestingly, the software has been made much more accessible through the development of this GUI and a Snakemake workflow, making it a highly scalable command-line tool that can be executed on high-performance computing facilities.

Yet, DNAscan2 is computationally optimized – minimizing the computational resources required for its use relative to DNAscan. For instance, a 97% improvement is observed in the average memory usage in the SNV and indel calling stages for WGS relative to DNAscan. 


In conclusion, DNAscan2 is a versatile, computationally-efficient, scalable, user-friendly, and end-to-end pipeline for NGS data analysis. The pipeline is designed to meet the needs of a broad range of users with varying bioinformatics skills, computing facilities, and application purposes. With its ease of use and enhanced variant calling protocols, the software is probably ideal for accurate and efficient NGS data analysis, ultimately augmenting biomedical research and clinical genetics studies.

Article Source: Reference Paper | DNAscan2 Available at: GitHub

Learn More:

Website | + posts

Diyan Jain is a second-year undergraduate majoring in Biotechnology at Imperial College, London, and currently interning as a scientific content writer at CBIRT. His passion for writing and science has led him to pursue this opportunity to communicate cutting-edge research and discoveries engagingly to a broader public. Diyan is also working on a personal research project to evaluate the potential for genome sequencing studies and GWAS to identify disease likelihood and determine personalized treatments. With his fascination for bioinformatics and science communication, he is committed to delivering high-quality content a CBIRT.



Please enter your comment!
Please enter your name here