For more than fifty years, biomolecular simulation has been dominated by classical empirical force fields. Despite being extensively employed in biomolecular dynamics, crystal structure prediction, and drug discovery, they typically lack the precision and transferability necessary for predictive modeling. In this paper, researchers from the University of Cambridge introduced MACE-OFF23. A high degree of quantum mechanical theory combined with cutting-edge machine learning techniques is used by MACE-OFF23, a transferable force field for organic molecules, to predict different features of molecular systems. It generates trustworthy descriptions of molecular crystals and liquids, precise dihedral torsion scans, and quantum nuclear effects. MACE-OFF23 facilitates high-accuracy and inexpensive first-principles simulations by determining free energy surfaces, simulating a fully solvated tiny protein, and determining the folding dynamics of peptides.

Introduction

Recently, there have been significant advancements in the accuracy, robustness, and computational speed of machine learning (ML) force fields. In materials chemistry contexts where density functional theory was the preferred approach in the past, ML is now frequently applied. In these applications, a lot of scientifically intriguing and difficult events cannot be well described by the empirical force fields that are now accessible, such as the embedded-atom approach, due to their lack of accuracy and transferability. Phase diagrams of inorganic perovskites and alloys, device-scale simulation of phase-change memory materials, and simulation of the quenching of amorphous silicon are among the successful applications of ML potentials. On the other hand, there are distinct trade-offs involved in simulating bio-organic systems, with a stronger focus on simulating systems over an extended period of time. 

Knowing about Force Fields

When it comes to small molecules, semi-empirical quantum mechanics is less expensive than quantum chemistry techniques, but its precision is merely moderate. Electrostatic and dispersion interactions have been incorporated into transferable machine learning force fields for organic chemistry, such as ANI and AIMNet potentials. In a hybrid ML/MM simulation environment, a polarizable electrostatic model was coupled with the most used ML force field, ANI-2x. AIMNet models use a message-passing architecture that includes ANI symmetry functions and expands their applicability to include a wider range of charged species and chemical components. The PhysNet model incorporates long-range electrostatic and dispersion interactions with a message-passing framework.  

The FENNIX model integrates a physical long-range functional form for dispersion and electrostatics with a local equivariant machine learning model; nonetheless, more extensive benchmarking is needed to evaluate its correctness. The ANA2B potential uses a short-ranged multilevel potential (ML potential) with long-range classical multipolar electrostatics, polarization, and dispersion interactions; nevertheless, its computational capability and accuracy for bigger biomolecules have not yet been shown. Another innovative machine learning force field for biomolecular simulations is the GEMS model, which is based on the SpookyNet architecture. However, in order to produce a stable model, each new simulation necessitates extensive reference quantum chemistry calculations on pertinent long peptide segments.

MACE-OFF23 Unveiled

 MACE-OFF23 is a series of three purely local transferable bio-organic machine learning force fields in this work. The ten most significant chemical elements for organic chemistry—H, C, N, O, F, P, S, Cl, Br, and I—have their force fields parameterized.

They are able to precisely characterize interactions between and within molecules in neutral closed-shell environments. This makes it possible to simulate a large variety of chemical systems, including biopolymers, drug-like compounds, and molecular liquids and crystals.

The models were validated on several tasks, such as the calculation of lattice parameters and enthalpies of formation of molecular crystals, geometry optimizations, the prediction of small molecule torsion barriers, and the calculation of the Raman spectra of molecular crystals, including nuclear quantum effects. 

Additionally, the models for estimating the densities and temperatures of vaporization of various molecular liquids have been tested by researchers. Researchers specifically look at how effectively MACE-OFF23 replicates density and radial distribution functions, two fundamental characteristics of water. In order to demonstrate the model’s capabilities, scientists simulated the folding of Ala15 at various temperatures and calculated the free energy surface of the alanine tripeptide in both explicit water and vacuum. An all-atom simulation of the protein crambin in explicit water (18,000 atoms) was also conducted by researchers. In this simulation, the researchers detected the predicted secondary structure and calculated its vibrational spectrum, which was found to be in good accord with previously published experimental results. Lastly, the computing speed of the OpenMM and LAMMPS simulation programs’ present implementation was tested by researchers.

Looking into MACE Model Architecture

The MACE model is a force field that connects the potential energy of the system to the locations and chemical composition of individual atoms. By breaking down the overall energy into atomic site energies, it achieves linear scaling. In a graph, two nodes (atoms) are connected by an edge if and only if they are in the same local environment. The spherical harmonic basis is used to express the array of features of a node. Every MACE-OFF23 model consists of two layers, where the first layer’s node properties are determined by the atoms’ chemical environment. 

The one-particle basis is composed of interatomic displacement vectors and the characteristics of nearby atoms. By contracting the product basis with the generalized Clebsch-Gordan coefficients, one can derive the equivariant higher-order basis, Bi. By appending the message to the atoms’ features from the previous iteration, the recursive update of node features is produced.

The foundation of the training set is the SPICE dataset, which is used for 5% of testing and 95% of training and validation. Using the PSI4 program, the MACE-OFF23 model is trained to replicate forces and energy calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of quantum mechanics. A subset of SPICE with ten chemical elements and a neutral formal charge is contained in the dataset. Larger molecules with 50–90 atoms were added from the QMugs dataset to aid in the understanding of intramolecular non-bonded interactions. Water clusters extracted from molecular dynamics simulations of liquid water were added to the dataset. Configurations having a maximum force inaccuracy of more than 2 eV/Å  were eliminated from the training set in order to cleanse the dataset.

A mechanism that regulates model expressivity allows for precise simulations at the lowest possible computational cost: the MACE model. Three variations with different accuracy levels and cutoffs are presented: MACE-OFF23(S), MACE-OFF23(M), and MACE-OFF23(L). While the medium model strikes a balance between speed and precision, the tiny model is appropriate for large-scale simulations. The models have a body order of four and two layers.

Applications

MACE-OFF23 enables researchers to explore a broader range of chemistry by allowing observation of vibrational spectrum and accurate secondary structure. It also helps to determine the folding dynamics of peptides and free energy surfaces in explicit solvents.

MACE-OFF23 effectively predicts a broad range of gas and condensed phase properties of molecular systems, showcasing the impressive potential of local, short-range models. It offers reliable descriptions of liquids and crystals of molecules, including quantum nuclear effects, and accurate, convergent dihedral torsion scans of compounds that are not yet known.

Conclusion

The accuracy, computation speed, and extrapolation benefits offered by the MACEOFF23 models—which are based on the MACE higher-order equivariant message passing architecture—are highly advantageous for organic molecule simulations. This makes the current model more suitable for use with neutral, non-radical, and non-reactive systems. These models can be expanded to include charged species and long-range interactions. They are based on the ANI models. In an effort to get around this restriction, AimNet-2 expands the ANI models to include charged species and long-range interactions. Explicit charges will also be included in the future MACEOFF model in order to mimic systems relevant to biology.

Article Source: Reference Paper

Important Note: bioRxiv releases preprints that have not yet undergone peer review. As a result, it is important to note that these papers should not be considered conclusive evidence, nor should they be used to direct clinical practice or influence health-related behavior. It is also important to understand that the information presented in these papers is not yet considered established or confirmed.

Learn More:

Deotima Chakraborty

LEAVE A REPLY

Please enter your comment!
Please enter your name here