The researchers at Michigan State University in the United States used extensive structural datasets along with a refined and annotated collection of antiviral phytochemicals to determine which naturally derived medicines have the best chance of provoking strong binding interactions with SARS-CoV-2 proteins and thus preventing or disrupting the viral infection process.
Millions of people have died due to the COVID-19 pandemic around the world. Despite the approval of a number of safe and effective vaccines and medications, the problem remains unaddressed for those with underlying medical disorders and those living in underprivileged areas without immunizations or proper medical infrastructure. This is especially difficult as new SARS-CoV-2 strains arise. Using naturally abundant phytochemicals in combination with a polypharmacological approach that targets numerous critical viral proteins could be one way to solve this problem. It can result in stronger functional inhibition and give protection against escape mutations. A significant benefit of this combinational drug approach (polypharmacology) is that the virus would have to perform many simultaneous mutations to become immune to each individual treatment in the combination.
Antiviral phytochemicals derived from plants are a viable starting point for screening and discovering particular medicines that are effective against SARS-CoV-2. Target identification and verification, compound screening and lead discovery, preclinical research, and clinical development have all benefited from the use of Machine Learning.
Because of their critical roles in viral replication and infection, the researchers screened numerous non-structural proteins and two versions of the structural spike protein. They conducted ligand docking simulations (structure-based virtual screenings or SBVS) to determine the expected docking free energies between antiviral phytochemicals and proteins. Examining the distributions of the docking energy scores for each protein, they discovered lead compounds with a high affinity toward individual protein structures. SBVS could not be done for all phytochemicals of interest due to the time-consuming nature of high-resolution docking simulations. As a result, machine learning algorithms were used to anticipate potential leads from a second large phytochemical library. With computationally selected leads from the docking simulations, lead clusters were recognized by evaluating the total or comparative abundance of lead phytochemicals in each cluster. They next applied their clustering algorithm to the vast unscreened library. Overall, this study identified 62 lead phytochemicals that may inhibit one or more SARS-CoV-2 proteins. In a SwissADME drug screening, eighteen of the leads showed promising results. The study also determines that the use of machine learning speeds up the ligand screening process and gives rise to a 4-fold increment in lead compound output.
CASTp software was used to identify concave areas of the protein surface that may facilitate ligand binding before undertaking high-resolution docking between phytochemical libraries and individual SARS-CoV-2 protein structures.
Score functions are capable of accurately recording the physicochemical contributions of macromolecular compounds during protein-ligand docking. After testing multiple score functions on two distinct SARS-CoV-2 protein structures, RosettaLigand was chosen as the SBVS score function. To identify combinations of promising phytochemical binders against several SARS-CoV-2 proteins (structural and non-structural), they used the Rosetta high-resolution protein-ligand docking protocol (SBVS) in conjunction with ligand clustering via machine learning strategies (LBVS).
The classification accuracy rate is an important metric to measure the performance of different models because the prediction is mainly affected by classifying molecules. High Shannon entropy is another essential feature of a good model since it shows uniformly distributed classes, which helps avoid classification imbalances. Following a comparison of various methods such as Ward hierarchical clustering, spectral clustering, affinity propagation clustering, and OPTICS, it was determined that Ward hierarchical clustering with random forest classification produced the best results, with 52 clusters formed, an accuracy of 88 %, and a Shannon entropy of 0.943. To cluster and categorize phytochemicals, the Ward Hierarchical Clustering approach and the Random Forest method were chosen. Chemical characteristics of flavone and alkaloid were shown to be the most predictive of lead compounds when ligands from the original screen were clustered.
Using a combination of clustering and SBVS data, the scientists found 34 lead phytochemicals and eight lead clusters. Because different SARS-CoV-2 protein structures created distinct energy score distributions, the energy scores were standardized by comparing phytochemical binding capabilities across different structures using z-scores. The z-scores represent the number of standard deviations from the sample means, which are the averages of all lowest energy scores for the dockings of the first 272 antiviral phytochemicals (in SBVS) with specific protein structures in this study.
The addition of a ligand-based virtual screen (LBVS) raised the lead identification rate from 2.18% (SBVS alone) to 16.44 % (SBVS + LBVS). The 1000 novel phytochemicals were divided into 52 formed clusters, with 53 belonging to the lead cluster. They did 298 docking simulations in between 53 anticipated lead phytochemicals and their related protein structures based on cluster specificity.
The scientists used SwissADME to derive drug property data for lead phytochemicals discovered using the original SBVS and the LBVS & SBVS combination methods. Eighteen compounds looked promising in all examined categories, with a maximum of one violation.
Finally, for 17 leads, a phytochemical-plant network was created in order to find plants that contained multiple leads.
These findings informed the LBVS, which yielded 28 new lead compounds and a four-fold increase in lead discovery rate. They reduced the number of therapeutically promising compounds from 62 to 18 by using physicochemical filters on their panel of 62 phytochemical leads. Multiple lead compounds with favorable drug-likeness can be generated from particular plants among these phytochemicals. Rhein and camptothecin, which have high potential binding affinities to NSP13 (7NIO) and NSP7&8 (6YHU), respectively, stood out as having drug-like qualities that were superior to Remdesivir’s and comparable to Paxlovid, Doravirine, and Molnupiravir in many ways.
These studies rely on high-quality simulation data, statistical conclusions, and machine learning predictions. While recent experimental and computational discoveries support the medicinal potential of the lead compounds found in this study, more in vivo and in vitro research is required to confirm ligand function and efficacy. “We expect that our findings and approach will aid in expanding the scope of drug discovery efforts and lowering the high failure rate before expensive lab testing.” The scientists anticipated.
Story Source: Wang, Z., Belecciu, T., Eaves, J., Bachmann, M., & Woldring, D. (2022). Phytochemical Drug Discovery for COVID-19 Using High-resolution Computational Docking and Machine Learning Assisted Binder Prediction. ChemRxiv.
Code Availability: https://github.com/ziruiwang1996/ligand_protein_docking