Researchers from the University of Illinois Urbana-Champaign introduce APACE, AlphaFold2, and advanced computing as a service, a computational framework that manages this TB-sized AI model and database efficiently and optimizes AlphaFold2 to run at scale in high-performance computing systems. Researchers show how APACE is used in the Delta and Polaris supercomputers to speed up the prediction of protein structures for a range of proteins. They also show how APACE cuts down the time it takes to get insights from days to minutes by distributing 200 ensembles across 300 NVIDIA A100 GPUs. This platform can be easily integrated with autonomous labs to provide large-scale automated discovery.

Introduction

Scientific and engineering advances are being made possible by innovation at the nexus of sophisticated computers and artificial intelligence. The emergence of AI models like AlphaFold and GPT-4, among others, opens up new possibilities for automating and accelerating scientific discovery. However, a long-standing custom in the AI community has been broken, as some of these models have not been made available to the general public. Some have suggested that many potential users cannot work with these AI models due to their sheer size.

To overcome this limitation, researchers present how to integrate massive AI models with high-performance computing systems, enabling a diverse range of users to fully utilize AI’s potential for scientific study. Because AlphaFold2 is an AI model transforming biophysics research and necessitates the best possible use of contemporary supercomputing settings for precise and quick protein structure prediction (PSP), researchers have chosen it as the science driver for this project. Here, researchers show how to optimize AlphaFold2 and its database—which has a data storage capacity of over 2.6 TB—to reduce the time it takes to produce accurate PSPs from weeks to minutes.

About AlphaFold2

AlphaFold2 is a deep learning technique-based protein structure prediction system demonstrating exceptional performance at the 14th Community Wide Experiment on Protein Structure Prediction. It has since improved with multimer prediction and has been used to predict protein complexes like antibodies and sample diverse protein conformations. This study uses AlphaFold2 version 2.3.0, with pre-trained neural network parameters, including monomer and multimer v3. Central Processing Units (CPUs) are used by the protein model AlphaFold2 to compute important input properties like multiple sequence alignment (MSA) and structural templates. The MSA is a group of homologs of the query protein’s protein sequence that captures the evolutionary relationships between proteins. It is calculated by aligning a query protein sequence with known sequence homologs from databases such as Uniclust, utilizing CPU-based sequence alignment methods such as Jackhmmer. The precision of AlphaFold2’s predictions is enhanced by the use of structural templates, which are experimentally known protein homologous structures. Subsequently, these templates are taken from publicly accessible protein structure repositories such as the Protein Data Bank.

In the GPU phase, AlphaFold2, a tool for predicting 3D structures of proteins, makes use of templates and features from MSA. Through recurrent information transmission, the evoformer network improves representations for amino acid residue interactions. After that, a structure module uses the modified representations for translations and rotations. The Molecular Dynamics engine then applies relaxation through minimization to the resultant 3D structure to improve accuracy. The data refines the predictions even further by cycling back to the beginning of the evoformer blocks after creating the final structure.

Understanding APACE

The researchers present distributed computing in supercomputing environments, APACE, AlphaFold2, advanced computing as a service, and a computational framework to speed AlphaFold2 through CPU & GPU optimizations. An important part of protein structure prediction algorithms, which range from drug discovery to genome interpretation, is played by the AI model AlphaFold. Researchers have introduced APACE as a service to improve the use and effect of these tools. The AI model and its TB-sized database are handled effectively by the computational framework APACE, which makes it possible to perform expedited protein structure prediction assessments in contemporary supercomputing systems. The time-to-solution was reduced from weeks to minutes using APACE, which was found to be up to two orders of magnitude faster than off-the-shelf AlphaFold2 implementations using four exemplar proteins. This methodology can seamlessly integrate with robotics labs to streamline and expedite scientific exploration.

Prediction of Protein Structures by APACE

Protein structure prediction efficacy and operational proficiency of APACE, a resource management tool, were assessed. The program was created with the parameters of the Simple Linux Utility for Resource Management (SLURM). It was used to load the proper environment and module for simulations on the supercomputers Delta and Polaris. Like with AlphaFold2, the parameters included neural network and MSA/template-related settings. It has been demonstrated that APACE, a parallel optimization technique, may increase CPU speed by an average of 1.8X in Delta and 1.78X in Polaris, independent of the number of compute nodes. The results obtained by utilizing 8 GPUs for 6AWO and 40 GPUs for 6OAN, 7MEZ, and 6D6U also achieve considerable GPU speedups. According to the findings, 6AWO accelerated Delta by 4.4 on A40 and A100 GPUs and 4.98 on Polaris.

Additionally, the results demonstrate that employing NVIDIA A100 GPUs consistently results in faster prediction times. For simple and intricate structures, APACE provides impressive speedups while maintaining the accuracy and stability of the original AlphaFold2 model. Hundreds of GPUs can be used for extensive analyses with it.

Conclusion

Supercomputing is used by the scientific framework APACE to streamline data staging and storage on Polaris and Delta supercomputer systems, hence cutting down on time-to-insight. Conformational ensembles of protein structures can be predicted thanks to this method, which maximizes CPU and GPU computing. Scientific discovery is accelerated, and researchers are given a computational framework for effective research because of its easy integration with robotic laboratories.

Article Source: Reference Paper | Reference Article | Data, Materials, and Software Availability: GitHub.

Learn More:

Deotima
 | Website

Deotima is a consulting scientific content writing intern at CBIRT. Currently she's pursuing Master's in Bioinformatics at Maulana Abul Kalam Azad University of Technology. As an emerging scientific writer, she is eager to apply her expertise in making intricate scientific concepts comprehensible to individuals from diverse backgrounds. Deotima harbors a particular passion for Structural Bioinformatics and Molecular Dynamics.

LEAVE A REPLY

Please enter your comment!
Please enter your name here