Home Bioinformatics Unveiling Flexiplex: A Versatile and Fast Demultiplexing and Sequence Searching Tool for...

Unveiling Flexiplex: A Versatile and Fast Demultiplexing and Sequence Searching Tool for Omics Data

February 23, 2024

Omics datasets generated by high throughput sequencing contain vast amounts of nucleotide data that must be analyzed to extract meaningful biological insights. A common early step is identifying and extracting specific sequences of interest, like genetic barcodes or variants. Existing tools have limitations for this task, motivating the development of Flexiplex, a new versatile sequence searching and demultiplexing tool.

The Need for Flexible Sequence Searching in Omics Data

High throughput sequencing techniques allow entire genomes, transcriptomes, and more to be read at base pair resolution. For example, a single RNA sequencing experiment can generate billions of short sequence reads. To analyze this flood of data, researchers first must extract the particular reads relevant to their specific project. This could involve:

Finding reads from cells expressing certain genes or mutations
Demultiplexing reads by sample barcode to attribute them correctly
Identifying reads from specific cell types in a mixture

Standard command line tools have drawbacks for sequence searching on large omics datasets:

grep finds exact matches only, no tolerance for errors
Other tools allow mismatches but are slow on big data
Most search for a small number of sequences, not thousands of barcodes
Output formats may not work with downstream analysis

Specialized demultiplexing tools exist but have limitations:

Designed for specific experiment types like 10x single-cell RNA-seq
Often cannot handle highly noisy reads like long-read sequencing
Many are complex to set up and require extensive computing resources

There is a need for a flexible sequence search and demultiplexing tool that is fast, lightweight, and customizable to diverse omics applications.

Introducing Flexiplex for Versatile Sequence Analysis

To address the limitations of existing methods, the researchers from The Walter and Eliza Hall Institute of Medical Research developed Flexiplex, an efficient tool for searching reads and demultiplexing barcodes. Flexiplex has several key features:

Finds approximate matches allowing substitutions, insertions, and deletions
Demultiplexes reads by finding the best barcode match from a large list
Split chimeric reads containing multiple barcodes
Easy to install and run with minimal dependencies
Customizable – flanking sequences, barcode lists, error tolerance, etc
Multithreaded for fast processing of large datasets

Flexiplex uses a combination of two algorithms:

Edlib for rapid flanking sequence search
Custom dynamic programming method for optimal barcode alignment

This enables both speed and sensitivity to errors in the barcode and UMI regions.

Flexiplex can be used in two modes:

Search for user-provided sequences allowing mismatches
Discover novel barcodes directly from the data

Benchmarking Flexiplex’s Performance

To validate Flexiplex’s capabilities, the researchers tested it on real and simulated sequencing datasets, comparing it to leading specialized tools.

Accurate Sequence Search in Low-Error Short Reads

Flexiplex was first benchmarked for searching known sequences in low-error Illumina data. The Chen et al. single-cell mixture dataset has:

Fusion gene unique to MCF-7 cells
Viral gene unique to HEK293T cells
SNP unique to T47D cells

Using 34-54bp segments from these genes, Flexiplex efficiently extracted matching reads:

Processed 200 million reads in 24 minutes (1 thread)
10X faster than similar tools like seqkit grep

Allowing 1-2 mismatches boosted sensitivity:

Found 97% more reads for MCF-7 fusion vs. grep’s exact matching
Cellular barcode analysis showed high precision – almost no false positives

This demonstrates Flexiplex’s power for fast and accurate sequence search even in low error data.

Demultiplexing Noisy Long Reads

Next, the researchers tested demultiplexing cellular barcodes from noisy Oxford Nanopore long reads. On Ebrahimi et al.’s simulated ONT dataset, Flexiplex correctly demultiplexed the most reads across all error rates. To validate this on real data, the researchers used Tian et al.’s scmixology 2 dataset:

A pool of 5 cell lines with ONT cDNA reads
Matched Illumina data for orthogonal cell line validation

Comparing Illumina SNPs and short-read barcodes, Flexiplex achieved the highest accuracy:

99% concordant cell line assignments between methods
Outperformed specialized tools like scTagger and FLAMES

Flexiplex also split chimeric reads effectively, further boosting performance. This shows Flexiplex’s robustness for demultiplexing even highly erroneous reads.

Discovering Cell Barcodes from Scratch

Finally, Flexiplex’s ability to discover novel barcodes without any prior barcode list was tested. The leading tools were compared on Tian et al.’s data and 3 hiPSC ONT datasets from You et al. Flexiplex showed competitive sensitivity and specificity to other tools for recovering true barcodes. Critically, Flexiplex was 4-40X faster than specialized tools like scTagger and BLAZE for barcode discovery.

Conclusions

Flexiplex enables fast and customizable analysis of sequencing reads to extract biological signals from noise. Benchmarks on real datasets demonstrate:

High accuracy – finds true matches robustly, even with errors
Speed – processes data rapidly leveraging multithreading
Low resource – memory efficient compared to alternatives
Easy to use – simple install and runtime

Flexiplex balances generality and specialization – adaptable to diverse experiments while still highly performant. It addresses the growing need for efficient sequence search and demultiplexing as omics datasets scale exponentially. With its combination of accuracy, speed, flexibility, and usability, Flexiplex represents an important new addition to the omics analysis toolkit.

Article source: Reference Paper | Flexiplex is available on GitHub

Follow Us!

Learn More:

Dr. Tamanna Anwar

Website | + posts

Dr. Tamanna Anwar is a Scientist and Co-founder of the Centre of Bioinformatics Research and Technology (CBIRT). She is a passionate bioinformatics scientist and a visionary entrepreneur. Dr. Tamanna has worked as a Young Scientist at Jawaharlal Nehru University, New Delhi. She has also worked as a Postdoctoral Fellow at the University of Saskatchewan, Canada. She has several scientific research publications in high-impact research journals. Her latest endeavor is the development of a platform that acts as a one-stop solution for all bioinformatics related information as well as developing a bioinformatics news portal to report cutting-edge bioinformatics breakthroughs.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

The Need for Flexible Sequence Searching in Omics Data

Introducing Flexiplex for Versatile Sequence Analysis

Benchmarking Flexiplex’s Performance

Accurate Sequence Search in Low-Error Short Reads

Demultiplexing Noisy Long Reads

Discovering Cell Barcodes from Scratch

Conclusions

Follow Us!

LEAVE A REPLY Cancel reply

Must Read

Company

Latest News

Popular Categories