Scientists used a deep learning approach to create a ‘Deep Neural Network Classifier’ that can distinguish between physiologic and accidental zinc-binding sites in the 3D structure of metalloproteins.
The Protein Data Bank contains at least one metal ion in 38% of protein structures. Not all of these metal sites, nevertheless, have biological significance.
Protein-metal complexes that do not exist in vivo can form when cations present as impurities in the crystallization buffer or during sample preparation lead them to do so.
To create a deep neural network classifier that can distinguish between physiologic and accidental zinc-binding sites in the 3D architectures of metalloproteins, the scientists used a deep learning approach.
Using manually annotated sites that were taken from the MetalPDB database, they trained the classifier. An accuracy of roughly 90% was attained by the classifier using a 10-fold cross-validation process.
The rules acquired on zinc sites may have general application because the same deep neural network classifier could predict the physiological importance of non-heme mononuclear iron sites with an accuracy of approximately 80%.
The scientists deduced some general rules by calculating the relative weights of the attributes describing the input zinc sites from the network perspective and by scrutinizing the features of the MetalPDB datasets.
The amino acids that form coordinate bonds with the metal ion (the metal ligands) have limited solvent accessibility in physiological locations, there are a significant number of residues in the metal environment, and there is a clear pattern of conservation of Cysteine and Histidine residues in the site.
On the other hand, a small proportion of donor atoms from the polypeptide chain are often present at adventitious locations (often one or two). These findings assist in determining if novel metal-binding sites in 3D protein structures have any physiological significance.
Metalloproteins: Reactivity and Physiological Role of Metal Ions
While it has been predicted that at least 40% of enzymes need metal ions for their biological function, more than one-third of the entries in the Protein Data Bank contain at least one metal ion.
A variety of metals are indeed necessary for life, as is widely recognized. The local protein structural environment greatly influences the reactivity and physiological function of metal ions in metalloproteins by regulating the metal’s position inside the active site, its interactions with the substrate, and, for redox-active metals, its reduction potential.
Inaccuracies Due to Hampered Local Quality of 3D Environment
In the Protein Data Bank (PDB), all 3D structures of proteins have been solved by X-ray crystallography in about 88 percent of cases. Identifying the chemical composition of the bound metal ion and determining whether the detected site is physiologically significant or an artifact of experimental conditions are the two primary recurring problems in the evaluation of metal-binding sites (MBSs) in these biomolecular structures.
About the first issue, it is well recognized that sample preparation techniques, contamination by unwanted metals, or experimental circumstances, such as pH or irradiation, can have an impact on MBS occupancy.
Particle-induced X-ray emission (PIXE) studies on a sample set of 32 metalloproteins from structural genomics programs, for instance, revealed the existence of metal ions bound to the protein that was not present in the PDB structure that was originally deposited.
The lack of clarity regarding the identity of the metal ion in the MBS might further degrade the local quality of the 3D environment, resulting in warped geometries and other errors.
Suggestions from the Neural Classifier
The scientific literature contains many in-depth assessments of MBSs that concentrate on the characteristics of clearly defined, biologically significant places.
In this paper, the scientists directly addressed the second problem raised in the above paragraph by focusing on the comparison of physiological vs. accidental MBSs.
According to prior research, adventitious sites typically appear at the protein surface and have metal coordination numbers (CNs) that are on the lower end of the range for all sites of a particular metal.
The researchers used the annotations of the zinc- and mononuclear iron-binding sites in the MetalPDB database to distinguish the physiological and adventitious ones, producing a reference dataset to more thoroughly support these claims and perhaps quantify them.
Using a deep learning (DL) method, they made use of this resource to train a classifier exclusively on zinc-binding sites.
DL is becoming more and more well-liked in structural bioinformatics, both for the study of experimental structures and for the prediction of 3D protein structures.
Two recent examples that are pertinent to our work are the classification of enzymatic vs. non-enzymatic metal ions in proteins and the discovery of water interaction sites.
Both zinc- and iron-binding sites, which are both transition metal ions, could be found in physiological locations by our neural classifier with high accuracy.
The general characteristics of physiological MBSs were provided by the analysis of the relative importance of the various features in determining the neural network’s performance.
Takeaways from the Study
In conclusion, the current work offers the community two things:
- A sizable, well-organized dataset of annotated physiological/adventitious metal sites that can be used in other structural bioinformatics studies of metalloproteins; and
- A freely accessible tool that allows non-experts to analyze new MBSs.
The researchers’ investigation identified a few key characteristics that characterize physiological sites and may be universally applicable, at least for transition metal ions.
The Endpoint
To categorize zinc(II)-binding sites in the 3D structures of proteins as physiological or adventitious, the scientists trained a deep neural network.
The deep neural network classifier had not only a remarkable accuracy for non-heme mononuclear iron sites but also a very strong performance for such sites.
They were able to identify a few specific structural characteristics that can be utilized as guidelines to discriminate between physiological and adventitious locations by using the indications offered by the study of feature importance.
MBSs with 20 or more protein residues, as well as sites with four or more metal ligands supplied by the protein chain, are very likely to be physiological. One must be cautious while analyzing this feature due to the provision of additional amino acids.
The inclusion of additional amino acids provided by sequence tags that are introduced, for example, to speed up protein purification (such as poly-His tags), which are not physiologically relevant, needs to be taken into consideration when evaluating this feature.
The solvent accessibility of the metal ligands is another crucial characteristic, but due to the width of the related distributions, it is impossible to determine an accurate cut-off value in this situation.
But in physiological MBSs, metal ligands often have minimal solvent accessibility. The guidelines mentioned above must at least be followed by “simple” (mononuclear) sites that contain transition metal ions.
Notably, the classification of complex metal cofactors, such as polymetallic clusters or organometallic cofactors, should be simpler than what we were able to achieve here.
This is mainly due to the fact that these sites are highly unlikely to assemble in the absence of specific biosynthetic systems or precisely calibrated chemical conditions during sample preparation.
The scientific community can use the current deep neural network classifier for free as a standalone tool to annotate zinc and iron sites in recently discovered 3D structures of metalloproteins.
This enables researchers to keep up with structural biology projects’ increasing throughput and enables the functional study of metal sites even for those unfamiliar with bioinorganic chemistry.
Article Source: Learning to Identify Physiological and Adventitious Metal-Binding Sites in the Three-Dimensional Structures of Proteins by Following the Hints of a Deep Neural Network Vincenzo Laveglia, Andrea Giachetti, Davide Sala, Claudia Andreini, and Antonio Rosato. Journal of Chemical Information and Modeling 2022 62 (12), 2951-2960.
DOI: https://doi.org/10.1021/acs.jcim.2c00522
Learn More About Bioinformatics:
Top Bioinformatics Books โ
Learn more to get deeper insights into the field of bioinformatics.
Top Free Online Bioinformatics Courses โ
Freely available courses to learn each and every aspect of bioinformatics.
Latest Bioinformatics Breakthroughs โ
Stay updated with the latest discoveries in the field of bioinformatics.
Tanveen Kaur is a consulting intern at CBIRT, currently, she's pursuing post-graduation in Biotechnology from Shoolini University, Himachal Pradesh. Her interests primarily lay in researching the new advancements in the world of biotechnology and bioinformatics, having a dream of being one of the best researchers.