An article published recently in Nature Journal introduced the PanGenome Research Tool Kit (PGR-TK) with the purpose of facilitating analysis of structural and haplotype variation in the PanGenome and presented a new prospect of exploring complex genomic regions that are related to medical conditions but were challenging to inspect before. The demonstration of visualization and deeper insights into variants in repetitive genes by PGR-TK developed by Chen-Shan Chin et al. authenticate the ability of PGR-TK.

A Brief Outline of PGR-TK

PGR-TK (PanGenome Research Tool Kit) implements tools for assembling an indexed sequence database, fetching and querying sequences of interest from the database to create pangenomics graphs instead of constructing a whole genome at once, which is computationally extensive. PGR-TK uses minimizer anchors to generate pangenome graphs at different scales without more computationally intensive sequence-to-sequence alignment or explicitly calling variants with respect to a reference rather, it considers all input sequences equivalently without a preferential reference while generating a pangenome graph.  

PGR-TK is built to apply computation techniques and data structures developed for fast genome assemblers to pangenome analysis tasks. Moreover, the makers implement an algorithm to render insight by revealing the contrast of the repeat and rearrangement variations among the haplotypes in order to decompose tangled pangenome graphs into manageable units or principal bundles, projecting the linear genomics sequence onto the principal bundles and enabling straightforward visualization. 

PGR-TK uses the Assembly Genome Compressor to store pangenome assembly contigs and includes a binary for creating the sparse hierarchical minimizer (SHIMMER) index and generates a MAP-graph for a comprehensive representation of genomic variation. PGR-TK offers a set of efficient command line tools for various tasks and also allows interactive and in-depth analysis through its integration with the Jupyter Laboratory and other data science tools. The linear representation derived from the MAP-graph allows for efficient identification and classification of repeat, which are problematic to characterize.

Importance of Pangenome Analysis Toolkit 

The onset of the Human Genome Project made the scientific community hopeful about a complete understanding of cellular machinery since the genome typically regulates every aspect of the cell. The optimistic anticipation of decoding all genetic phenomena behind disease phenotypes was also there. Dramatic advances in DNA sequencing techniques have also encouraged Genomics study, but fulfilling the expectation is yet to be achieved. Breakthroughs in the field of genomics have certainly enabled the progression of knowledge but answers to numerous fundamental questions are still unrevealed. 

In this regard, the concept of Pangenome analysis emerged to capture most of the diversity in human genomes across populations and demography. The pangenome represents the entire set of genes of a species. Usually, genome projects generate a reference genome with which the genome of individuals is compared to decipher anomalies or variations at the genetic level, but the major problem that is addressed by the onset of Pangenome studies is there is no ideal or perfect reference genome that doesn’t possess intrinsic biases that leads to discrepancies. The HPRC (Human Pangenome Reference Consortium) of 47 human genome assemblies (94 diverse haplotypes) release is significant in the current landscape. 

Pangenome represents genetic structure and genetic variation across different individuals, but the diversity and complexity render challenges in interpretive studies. The visualization and analysis of variations are conventionally done with different graphical representation tools. For example, the authors mention variant graph and PanGenie, which focus on improving variant calling and genotyping. 

Stringomics graph with ‘stringlet, ‘Seqwish and de Bruijn graph-based approaches provide algorithms and data structures for improving storage and query efficiency and reducing bias caused by the alignment processes, which provide more accessible pictures to cognize repeats and rearrangements than using conventional multiple sequence alignments (MSA) that is not only complicated to analyze but also computationally intensive. Additionally, comprehending inference from complex repeats and variation is difficult with MSA. 

A pangenome graph can instead represent a relationship in complex regions through graph edge connections that are easier to decipher. The researchers identify a gap in this regard that revealing and comparing features of many different haplotypes through graphs is still sparse. Henceforth, a generalized graph framework as a software package, PGR-TK, can be a great aid owing to its virtues. 

Analysis of Complex Repetitive and Clinically Relevant Genes with PGR-TK

PGR-TK can resolve and visualize the most complex regions of the human genome, which also have importance regarding medically concerning phenotypes. For instance, MHC (Major Histocompatibility Complex) locus is important in investigating adaptive immunity and autoimmune disorders, but the region is highly polymorphic and thus renders challenge in obtaining a clear and complete elucidation of the MHC region across the human population and benchmarking variant callings. The journal describes that PGR-TK can enable meticulous variant calling, variant representation, and comparison of such complicated scenarios. 

With the help of PGR-TK, it is possible to obtain an intuitive understanding of genes with nested palindromic and tandem repeats and ampliconic genes. Genome in a Bottle (GIAB) marks numerous genes that are difficult to analyze but carries importance from the clinical perspective. The researchers provide an analysis of the GIAB clinical and medically important genes (CMRG) with a pangenome graph approach that will help the research community to adopt the pangenome resource for clinical and medical genetic applications.

Conclusion

The release of HPRC carries significance in the direction of disease diagnosis and personalized therapy. Therefore, Pangeonome analysis tools are essential for harnessing the full advantages of HPRC. From this perspective, the proposed PGR-TK can unlock insights from regions that are difficult to analyze but holds extreme importance, such as polymorphism, palindromes, tandem repeats, and other repetitive sequence

PGR-TK may not be suitable for analysis at the whole-genome scale yet. The most challenging part for researchers will be to obtain optimized results owing to the diversity of human genomes, so parameters for visualization should be fine-tuned with utmost caution. Apart from that, PGR-TK can be a great aid for further medical research.

Article Source: Reference Paper | Access: Source code

Learn More:

Website | + posts

Aditi is a consulting scientific writing intern at CBIRT, specializing in explaining interdisciplinary and intricate topics. As a student pursuing an Integrated PG in Biotechnology, she is driven by a deep passion for experiencing multidisciplinary research fields. Aditi is particularly fond of the dynamism, potential, and integrative facets of her major. Through her articles, she aspires to decipher and articulate current studies and innovations in the Bioinformatics domain, aiming to captivate the minds and hearts of readers with her insightful perspectives.

LEAVE A REPLY

Please enter your comment!
Please enter your name here