Scientists from the University of Washington, USA, studied how deep learning boosts and improves de novo protein binder design and devised a protocol for deep learning-aided protein binder design. In recent times, it has been plausible to design novel high-affinity protein-binding proteins using the structural information of the target alone. However, given the low success rate of the overall design, the opportunities for improvement are manifold. The authors explore the deep learning-aided augmentation of energy-based protein binder design. The authors find that the design success rate increases nearly tenfold on using AlphaFold2 or RoseTTAFold for assessing the probability that the structure of the designed binder binds the target.
Protein binder design: The story until now
Protein binders with high affinity for their targets are of high significance in biomedicine. Methods developed for designing such protein binders are, therefore, highly significant for therapeutics as well as diagnostics. Most prevalent methodologies involve either the immunization of an animal with the target for antibody generation or the screening of random complex libraries of antibodies or other scaffolds. Apart from being highly labor-intensive, these experiments provide little scope for control over the properties of the binding molecules.
Computational approaches for protein binder designing have shown significant improvements in terms of designing binders with desirable biophysical properties for binding. A recent approach by Cao et al. based on Rosetta for designing binding proteins using only the structure of the target was used to design binding proteins for 13 different target sites by designing sequences that were predicted to fold with complementary characteristics to the target region.
Deep learning-based approaches for protein binder design
Unprecedented accuracy with protein structure prediction has been achieved using deep learning techniques. AlphaFold2 and RoseTTAFold are the two most widely used deep learning-based approaches for protein structure prediction that incorporate hundreds of millions of parameters obtained by training on large datasets of protein sequences and structures. On the other hand, methods such as Rosetta use energy functions with one or two thousand parameters obtained from the biophysical properties of proteins and small molecules.
These deep-learning models learn iterative transformations of sequence and structure representations and generate highly accurate models.
The deep learning model DeepAccuracyNet was developed for accurate prediction for protein structure models. It has achieved state-of-the-art performance in accuracy prediction in CASP-14, and it uses a representation consisting of 3D convolutions of local atomic environments.
The authors hypothesized that these DL-based approaches could do away with the low success rate of Rosetta-based protein binder design.
Limitations of the Rosetta-based approach for protein binder design
The Rosetta-based approach has two main drawbacks, viz, the designed sequence may not fold to the expected monomer structure, and second, the intended structure may not bind to the target. This methodology frames the protein folding and binding problems in terms of energy functions. This approach requires both the designed sequence as well as the complex formed between the designed monomer structure and the target to be in the lowest energy state. Inaccuracies of the energy function, as well as an incomplete sampling of the conformational space, may result in either the non-folding of the designed sequence or the non-binding of the monomer structure to the target.
The authors dealt with these challenges by designing a deep-learning augmented protocol for de novo protein binder design. With both retrospective and prospective analyses, the authors show that the augmented protocol has a ten-fold higher success rate than the energy-based methodology.
The authors investigate the ability of deep learning methods to discriminate between binders and non-binders in a set of approximately one million experimentally characterized designs for ten different targets, as demonstrated by Cao et al.
Identification of Type I failures, wherein the designed sequence does not fold into the desirable monomer structure, was investigated using the Rosetta approach and the deep learning methods. DAN was able to partially discriminate between binders and non-binders, whereas the Rosetta approach showed little discriminatory power. This is not surprising as the metric used for identifying TypeI failure is the one used for stringent filtering of the input scaffold for Rosetta.
The utility of AlphaFold2 for monomer structure modeling was tested by evaluating the ability of AlphaFold2 to predict the binder monomer structures from Cao et al., for which structures have already been solved experimentally. AlphaFold2 predicted these structures with very high accuracy.
Next, the entire dataset from Cao et al. was tested for similarity of structure predicted by AlphaFold2 or RoseTTAFold and the experimentally solved structures. A disagreement would indicate a type I failure. They found that the closer the predicted structure was to the Rosetta-designed structure, the binding of the monomer with the target was more likely. Thus TypeI failures contribute to low success rates of binder design. Thus, binder design models can improve upon low success rates by identifying discrepancies between design models and structures predicted by AlphaFold2 or RoseTTAFold.
The possibility of using AlphaFold2 or RoseTTAFold complex prediction for discriminating between designs that form the desirable complex structure from those that do not was investigated. Both AphaFold2 and RoseTTAFold2 performed excellently on the binder discrimination task. Thus, using AlphaFold2 or RoseTTAFold2 can reduce type II failures, thus increasing success rates for protein binder design.
Prospective analysis and using ProteinMPNN
AlphaFold2 is seen to perform as expected in the prospective tests, increasing the success rates for protein binder design. The authors also used ProteinMPNN to increase the computing efficiency of the binder design pipeline. We now have a deep learning-aided pipeline for the de novo design of protein binders.
The authors anticipate a further increase in design success rates, given the deep learning revolution we are witnessing. Whether it will be from deep learning-based approaches alone or an amalgamation of both is what will be interesting to witness. Nevertheless, this protocol is highly useful for designing binders with high affinities and will thus have a great impact on therapeutics as well as diagnostics.
Article Source: Reference Paper | Reference Article
Banhita is a consulting scientific writing intern at CBIRT. She's a mathematician turned bioinformatician. She has gained valuable experience in this field of bioinformatics while working at esteemed institutions like KTH, Sweden, and NCBS, Bangalore. Banhita holds a Master's degree in Mathematics from the prestigious IIT Madras, as well as the University of Western Ontario in Canada. She's is deeply passionate about scientific writing, making her an invaluable asset to any research team.