Interpreting gene expression from the DNA sequence is one of the most challenging problems in genome biology. Decoding DNA sequence with the help of “Artificial Intelligence” to predict phenotypes from molecular to organismal level is needed in the next phase of genome biology research.
Google’s DeepMind in collaboration with Alphabet’s Calico have introduced neural network architecture “Enformer”, which is a transformers-based model. Enformer can predict gene expression from DNA sequences with high accuracy.
Transformers are a kind of deep learning model that have made substantial advances in natural language processing (NLP). They are made up of attention layers that compute a weighted sum across the representations of all other positions in the sequence to transform each position in the input sequence.
In the past, Deep convolutional neural networks (CNNs) were used to predict gene expression from DNA sequences in the human and mouse genomes. However, these models have a restriction in that they can only examine sequence fragments up to 20 kb distant from the transcription start site. Enformer was able to almost double the receptive field by using transformer layers, which allowed it to reach regulatory components up to 100 kb apart.
Enformer’s main goal is to better comprehend non-coding genome variations and anticipate the effects of such variants on gene expression in both natural and synthetic variants. Enformer was found to be way ahead of the previous best model Basenji2 in predicting RNA expression at the TSS of human protein-coding genes as evaluated by Cap Analysis Gene Expression9 (CAGE).
Researchers may take advantage of Enformer in interpreting the growing number of infection-related variants discovered by genome-wide association studies. Variations linked to complicated genetic illnesses or infections are commonly found in the non-coding section of the genome, causing disease via altering gene expression.
Enformar is a breakthrough in regulatory genomics. These advancements show that AI can play a considerably bigger role in genome biology than previously thought. Further research in this field might aid in the discovery of new possibilities.
Dr. Tamanna Anwar is a Scientist and Co-founder of the Centre of Bioinformatics Research and Technology (CBIRT). She is a passionate bioinformatics scientist and a visionary entrepreneur. Dr. Tamanna has worked as a Young Scientist at Jawaharlal Nehru University, New Delhi. She has also worked as a Postdoctoral Fellow at the University of Saskatchewan, Canada. She has several scientific research publications in high-impact research journals. Her latest endeavor is the development of a platform that acts as a one-stop solution for all bioinformatics related information as well as developing a bioinformatics news portal to report cutting-edge bioinformatics breakthroughs.