A group of scientists from the Department of Energy’s Oak Ridge National Laboratory (ORNL) and the Georgia Institute of Technology applied supercomputing and new deep learning technologies (Google’s DeepMind) to anticipate the structures and functions of an organism’s whole proteome.

The deep learning-based algorithms developed by the team predict protein structure and function from DNA sequences. The discovery may help enhance biotechnology, biosecurity, bioenergy and provide solutions to pollution and climate change.

The approach utilized by the researchers combines the application of the Summit supercomputer at ORNL and techniques built by Google’s DeepMind and Georgia Tech to speed up the correct identification of protein structures and function across whole genomes of species.

The cutting-edge computational method represents a huge step forward in tackling the significant biological problem of translating genetic information to predict functions.

Proteins are an essential part of the solution to this problem. They’re also crucial to answering many scientific concerns concerning human, environmental, and global health. Proteins are the cell’s workhorses, driving practically every function required for life, from metabolism to immune defense to cell communication.

Understanding the protein’s function from the sequence has been a bottleneck in the life sciences, as protein structure is mandatory for determining its function.

Data for roughly 350 million protein sequences are now available because of the breakthroughs in DNA sequencing technology, and this number is growing. Scientists have only determined the structures for around 170,000 of those proteins due to the significant experimental labor required to identify three-dimensional structures.

The research has modeled the whole proteomes—all of the proteins coded in an organism’s genome—of four bacteria, each with a proteome size of 5,000. Among these two bacteria have been discovered to produce critical compounds for the plastics industry. While the other two are involved in the broken down of metals. The structural data might aid scientists in developing new synthetic biology processes and approaches for reducing pollutants like mercury in the environment.

Sphagnum is essential for storing massive amounts of carbon in peat bogs, which hold more carbon than all of the world’s forests combined. The scientists also created models of the 24,000 proteins in sphagnum moss. These findings might aid scientists in determining which genes are most crucial for sphagnum’s capacity to absorb carbon and endure climate change.

The team also looked for genes that allow sphagnum moss to endure rising temperatures by comparing its DNA sequences to Arabidopsis, i.e., a model plant species.

These protein structures will aid in determining whether or not nucleotide modifications affect protein function. It will also help understand if the protein alterations are assisting plants in surviving in high-temperature environments.

Until recently, scientists haven’t had techniques to predict protein structure based on genetic sequences correctly. With the advent of deep learning techniques and high-performance computing, the prediction of structure and function merely from the sequence is revolutionized.

Though physical experiments and methods such as X-ray crystallography will continue to be necessary to determine protein structure and function, deep learning is shifting the paradigm by swiftly narrowing the enormous number of candidate genes to the most significant ones for further research.

Story Source: Gao, M., Lund-Andersen, P., Morehead, A., Mahmud, S., Chen, C., Chen, X., … & Sedova, A. (2021, November). High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function. In 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) (pp. 46-57). IEEE.



Please enter your comment!
Please enter your name here