Advancement in the field of genome sequencing has seen an increase in the amount of sequence data. In spite of that knowledge regarding the properties of protein still needs to be interpreted. Machine learning methods, particularly deep learning methods, are used to train language models to predict protein properties. A recent addition to the existing natural language processing technology is the Transformer model. PeSTo is a geometric transformer model that can predict protein-protein interactions by directly acting on atoms of the protein structure.
ย Getting to Know Proteins
Proteins are made up of amino acid residues and are essential components of the body. They are involved in several biological processes like DNA replication, catalyzing metabolic reactions, transporting molecules, responding to stimuli, etc. Proteins perform the functions specified by the information encoded in genes. To function, proteins fold into different conformations. The study of the tertiary structure of a protein is essential to understanding the biology behind biological processes, including diseases.
Protein primary structures are chains of amino acids that fold into 3D structures to form tertiary structures. Tertiary structures control the basic function of the protein. NMR spectroscopy, X-ray crystallography, and cryo-electron microscopy are a few wet lab techniques used to determine protein structures. Computational structure prediction by analyzing amino acid sequences can be of great help to existing lab procedures. Recent advances in machine learning techniques have been helpful in the area of protein structure prediction.
Machine Learning and Protein Structure Prediction
Technologies like Neural networks, Machine learning algorithms, and Artificial Intelligence have impacted the field of biology in a great way. Machine learning technology is based on the concept that systems can learn from and analyze input datasets and make decisions on their own. Machine learning, with available resources like big data, advanced computational resources, along with reliable toolkits, has made a great impact in the field of protein structure prediction and protein homology modeling.
AlphaFold and its Limitations
AlphaFold, an open-source artificial intelligence-based system developed byย DeepMindย in association withย EMBL-EBI, predicts a proteinโs tertiary structure from its amino acid sequence.
Apart from other proteins, it can also react with nucleic acids (DNA and RNA), ions, and lipids. The recent version of the tool AlphaFold 2 is able to predict some protein-protein interactions but still needs an advanced algorithm to efficiently identify protein interactions with nucleic acids (DNA and RNA), lipids, and ions, as it is specifically trained to identify protein-protein interactions.
Most prediction tools target specific protein-to-protein interactions. Researchers have been working on tackling this problem by developing new modeling algorithms. A special emphasis is being given to protein to small molecule interactions because of their clinical use in the field of drug development. Developing prediction tools require more complex calculations which are not only complicated but also time-consuming.
Keeping these problems in mind, researchers have come up with a new neural network-based transformer that targets atoms in proteins to predict interaction interfaces, termed Transformer. A transformer in computational biology refers to a deep learning model that can automatically learn through unsupervised learning. Primarily used in the fields of computer vision and natural language processing, the Transformer language model can be applied for Protein structure prediction, protein-protein interactions, drugโtarget interactions, etc.
PeSTo (Protein Structure Transformer)
The researchers have come up with a new geometric transformer. It can accurately predict the likelihood of how each amino acid of the query protein can form an interface with other proteins, nucleic acids, ions, lipids, etc. The transformer analyzes the 3D coordinates of the query protein and generates scores specific to residues.ย PeSToย (Protein Structure Transformer) is a parameter-free, deep-learning method of atomic coordinates to predict biological interfaces in proteins.
PeSTo can predict protein interaction interfaces from a protein structure. It produces swift and highly accurate data by processing massive amounts of structural data. It is available as a free online tool and can be identified as one of the most distinctive AI tools for protein structure prediction.
Mechanism
PeSTo considers atoms in a protein structure as point clouds whose geometry is represented by their relative displacement vectors and pairwise distances. This ensures translation invariance. The atom is represented by its coordinates and elemental names, and each atom is connected with a vector and scalar state. PeSTo acts on these point clouds and chooses the center atom to generate output. PeSTo uses vectors to encode geometrical states compared to earlier approaches like SE(3)-Transformers.
The PeSTo model was trained using 300,000 protein chains from the Protein Data Bank. It can predict the protein residue that takes part in a protein-protein interface. The output generated is a value ranging from 0 to 1, 0 for no interaction and 1 for participating in a proteinโprotein interface. Apart from identifying the proteinโprotein interface, it can predict the possibility of protein-nucleic acid (DNA and RNA) interfaces, protein-lipid interfaces, protein-ion interactions, and protein-ligand interactions. The protein-DNA interface was found to be compatible with the NMR results. PeSTo takes the input of tertiary protein structure either from the Protein Data Bank represented as its alpha-numeric ID or a pre-computed protein model from UniProt.
To validate the data generated by the PeSTo proteinโprotein interfaces, they were compared to several deep learning models like MaSIF (molecular surface interaction fingerprints) and AlphaFold-multimer, and the researchers observed that PeSTo performed equally well as AlphaFold-multimer. Moreover, PeSTo was able to process large volumes of structured data in a short time.
Advantages
- PeSTo does not require an additional calculation of the query proteinโs surface.
- The PeSTo model runs in milliseconds and can process vast structural information in a short duration.
- Since PeSTo is trained on atom position and element, it does not require classification or specific parameters to predict the proteinโprotein interface.
ย Conclusion
Conventionally Properties of proteins are resolved by wet lab experiments, which are not only costly and time-consuming but can also be quite challenging. Utilizing computational prediction methods, the time and resources required for lab experiments can be reduced, and researchers can focus only on relevant data for further analysis. The recent availability of a large number of sequenced proteins, along with advancements in the field of deep learning, has enabled researchers to understand and predict protein structure and interactions accurately. However, the existing models are producing comprehensive outputs, and the technology still needs to be developed.
Article Source: Reference Paper
Learn More:
Sipra Das is a consulting scientific content writing intern at CBIRT who specializes in the field of Proteomics-related content writing. With a passion for scientific writing, she has accumulated 8 years of experience in this domain. She holds a Master's degree in Bioinformatics and has completed an internship at the esteemed NIMHANS in Bangalore. She brings a unique combination of scientific expertise and writing prowess to her work, delivering high-quality content that is both informative and engaging.