Scientists from the Center for the Study of Systems Biology, Atlanta, in collaboration with Georgia Institute of Technology, Atlanta, and Oak Ridge National Laboratory, Oak Ridge, USA, have introduced a new method in which AlphaFold2 (AF2) can be used to predict protein complex structure and protein-protein interactions. It is expected to provide significant structural insights into various biological molecular systems.
The structure of protein complexes can be predicted at much higher accuracy than classical docking approaches by adapting AF2, even if the docking methods use monomeric structures predicted by AF2.
AlphaFold2, a DeepMind-developed deep learning technique for predicting protein structure from a sequence, has significantly improved protein structure prediction. Because deep learning is a data-driven approach, the completeness of the structural space of single-domain proteins and the number of sequences in sequence databases are two significant aspects contributing to the success of AF2. These elements have combined to allow sophisticated neural network models to be trained for accurate protein structure prediction. AF2 not only performed well on single-domain targets, but it also performed well on multidomain proteins and has been used for such proteins in various model organisms.
Several proteins that form complexes in prokaryotes are fused into long, single-chain, multidomain proteins in eukaryotes. The physical forces that drive protein folding are also associated with protein-protein associations. Understanding biological systems require accurate descriptions of protein-protein interactions.
AlphaFold2 recently computed incredibly accurate atomic structures for individual proteins. This study revealed that without retraining, the same neural network models created for single protein sequences in AF2 could be repurposed to predict the structures of multimeric protein complexes.
Is it possible to modify AF2 to anticipate the structure of a protein complex?
Following the release of AF2, the search for an explanation began almost immediately. One early work converted a two-chain structure prediction problem into a single-chain structure prediction problem by simply joining two protein sequences with a poly-glycine linker. A far better option is to change AF2’s “residue index” function, which avoids the requirement for a covalent linker, which is likely to cause artifacts.
Meanwhile, research has been conducted using docking methods and models of single proteins created with AF2. They are predicated on the assumption that AF2 provides high-quality monomeric models that could boost the likelihood of native-like docking positions. As some scientists pointed out, one problem with these findings is that the benchmark set utilized to train the AF2 deep learning models includes protein structures. The use of holo monomers in training compromises rigor because AF2 presumably gives an “observed” holo-structure for docking, despite the fact that the AF2 models were not trained on protein complex structures.
Is it possible to adapt AF2 to anticipate protein-protein interactions and identify higher-order protein complexes?
Several high-throughput experimental approaches have been developed to detect interacting protein partners, however, their results are typically incongruent and incomplete. Template-based techniques have been employed computationally, although they are limited to the discovery of homologs. Researchers have used a combination of classic protein-protein docking approaches, co-evolutionary signals, and even deep learning models on entire proteomes.
These are effective methods, but they need the use of paired multiple sequence alignments (MSAs) as inputs. The identification of orthologous sequences across species is required to generate paired MSAs, which is difficult in many situations due to the occurrence of paralogs in eukaryotes, protein cross-talk in disease pathways, and pathogen-host interactions.
In this study, the scientists show that AF2 can be tailored to predict both the presence of protein-protein interactions and the related quaternary structures, using numerous test sets and without employing paired sequence alignments.
Deep Learning Model ‘Af2complex’
For the provided query sequences of a target protein complex, the original AF2 data pipeline is first applied to collect input features for each query. Then, for complex structure prediction, AF2Complex assembles the individual monomer features.
AF2Complex, in contrast to other methods, does not require paired multiple sequence alignments. It outperforms several complex protein-protein docking techniques and AF-Multimer, AlphaFold’s multimeric protein creation.
The scientists also provide metrics for predicting direct protein-protein interactions between arbitrary protein pairs and test AF2Complex against various complicated benchmark sets as well as the E. coli proteome. Finally, in the case of cytochrome c biogenesis system I, the study presents high-confidence models of three sought-after assemblies formed by eight members of this system.
Story Source: Gao, M., Nakajima An, D., Parks, J.M. et al. AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat Commun 13, 1744 (2022). https://doi.org/10.1038/s41467-022-29394-2