Thursday, January 22, 2026
Home Bioinformatics PanMAN Enables Scalable High-Resolution Pangenome Analysis

PanMAN Enables Scalable High-Resolution Pangenome Analysis

PanMAN
Image Source: https://turakhia.ucsd.edu/panman/

The study led by researchers from UC San Diego and UC Santa Cruz introduces PanMAN, a new data structure designed to overcome the limitations of current pangenome formats. Applied to SARS-CoV-2,PanMAN compressed the entire genome sequence into just 366 mb while still preserving all the variations.

What is Pangenomics?

Studying genomes helps us in understanding the biology of a species at its most fundamental level. Earlier scientists sequenced one genome (an example could be the Human Genome Project) and used it as the ‘reference’ for the entire species, and individual genomes were compared to this reference to spot differences and hence mutations, variations, etc. But this method missed much of the natural variations within a species and couldn’t capture rare mutations or recombinations.

This is where pangenomics, a branch of bioinformatics, comes into the picture. It’s a study of thousands or millions of genomes together, revealing mutations, evolutionary history, and diversity that shape traits like disease resistance.

Recent advances in technologies have helped with sequencing genomes rapidly and cheaply, which means today we can now study the entire spectrum of genetic diversity for any species, but storing and analyzing millions of genomes requires computationally heavy resources. For example, graph-based formats represent genetic variation but don’t capture evolutionary /mutational history and have huge storage requirements, making them impractical.

One way of tackling this problem can be a compressive pangenomics approach, as suggested by the UC San Diego team, which works a lot like how ZIP and other lossless compression formats shrink large files (such as audio or video files) without losing any of the original information. Unlike other compression methods, researchers don’t need to decompress files to study them, saving time and computational power.

Understanding PanMAN and its internal architecture

Professor Yatish Turakhia and his team introduced a new data structure called Pangenome Mutation Annotated Network, aka PanMAN, which compresses genome data far more efficiently than existing formats while also encoding richer biological information such as mutations, phylogenies, annotations (metadata for biological interpretations), and complex events like recombinations and gene transfer.

PanMAN, at its core, is built from mutation-annotated trees (not to be confused with decision trees), PanMATs, which store an ancestral genome at the root, similar to having a reference genome, but also annotate mutations (substitutions, insertions, deletions) along the branches.

Multiple PanMATs are linked into a network, and edges between trees store complex mutations (recombination and gene transfer), covering both vertical inheritance and genetic events.

PanMAN is compact because it exploits ancestry. Each mutation is only stored once where it first appeared in the branch. This also avoids duplication, as across other genome sequences, they are referenced from the original branch. This dramatically reduces storage further.

PanMAN was designed to be more than a storage format; it’s a biological representation system!

Case Study: Covid-19 (SARS-CoV-2): the whys and hows

The researchers first tested PanMAN on microbial genomes, mainly those of COVID-19’s SARS-CoV-2, and the rest were E. coli, HIV, pneumonia, and TB genomes.

The specific reasons as to why SARS-CoV-2 was chosen as a major dataset are:

  • Massive Datasets: During the pandemic, scientists worldwide sequenced millions of SARS-CoV-2 genomes, so researchers had full, consistent datasets.
  • Rapid Evolution: SARS-CoV-2 mutates quickly, producing many variants. A pangenomic approach understands this variation and evolutionary history far better than a single reference genome.
  • Global Relevance: As the virus affected every population, analyzing its genomic diversity had immediate public health importance.

The team constructed a pangenome of over 8 million SARS-CoV-2 genomes, the largest ever built.

Normally, storing and analyzing this much genetic data would require terabytes of space. But with PanMAN, the entire dataset was compressed into just 366 MB; about 3,000 times less storage, even after preserving all the variations, which turns out to be the most compressible format among all existing variation-preserving pangenomic formats.

But to create such pangenomes for other species, the genomes must be lined up to identify mutations, insertions, deletions, etc. This also stands as a new computationally forbidden problem to which Turakhia’s lab developed TWILIGHT. It is a specialized computational tool designed to construct alignments on this massive scale.

Together, TWILIGHT and PanMAN can solve both sides of the challenge:

  1. Building the alignment across millions of genomes
  2. Compressing and encoding the alignments into a compact, biologically rich format.

Future of PanMan: What do researchers say?

PanMAN and TWILIGHT stood out and were successfully applied to microbial genomes, and the researchers are now planning to adapt these tools to human genomes, which are much larger and more complex. This expansion is crucial because human genetic diversity is vast, and studying it at scale could transform our understanding of disease, evolution, and personalized medicine.

Article Source: Reference Paper | Published Abstract | Reference Article |Availability: GitHub | Documentation.

Disclaimer:
The research discussed in this article was conducted and published by the authors of the referenced paper. CBIRT has no involvement in the research itself. This article is intended solely to raise awareness about recent developments and does not claim authorship or endorsement of the research.

Learn More:

Website |  + posts

Saniya is a graduating Chemistry student at Amity University Mumbai with a strong interest in computational chemistry, cheminformatics, and AI/ML applications in healthcare. She aspires to pursue a career as a researcher, computational chemist, or AI/ML engineer. Through her writing, she aims to make complex scientific concepts accessible to a broad audience and support informed decision-making in healthcare.

LEAVE A REPLY

Please enter your comment!
Please enter your name here