首页>
外文期刊>plos computational biology
>Bioactivity assessment of natural compounds using machine learning models trained on target similarity between drugs
【24h】
Bioactivity assessment of natural compounds using machine learning models trained on target similarity between drugs
Natural compounds constitute a rich resource of potential small molecule therapeutics. While experimental access to this resource is limited due to its vast diversity and difficulties in systematic purification, computational assessment of structural similarity with known therapeutic molecules offers a scalable approach. Here, we assessed functional similarity between natural compounds and approved drugs by combining multiple chemical similarity metrics and physicochemical properties using a machine-learning approach. We computed pairwise similarities between 1410 drugs for training classification models and used the drugs shared protein targets as class labels. The best performing models were random forest which gave an average area under the ROC of 0.9, Matthews correlation coefficient of 0.35, and F1 score of 0.33, suggesting that it captured the structure-activity relation well. The models were then used to predict protein targets of circa 11k natural compounds by comparing them with the drugs. This revealed therapeutic potential of several natural compounds, including those with support from previously published sources as well as those hitherto unexplored. We experimentally validated one of the predicted pair's activities, viz., Cox-1 inhibition by 5-methoxysalicylic acid, a molecule commonly found in tea, herbs and spices. In contrast, another natural compound, 4-isopropylbenzoic acid, with the highest similarity score when considering most weighted similarity metric but not picked by our models, did not inhibit Cox-1. Our results demonstrate the utility of a machine-learning approach combining multiple chemical features for uncovering protein binding potential of natural compounds. Author summaryA large fraction of small-molecule drugs has originated from natural compounds making them an attractive resource for search of potential lead compounds. Yet, this resource is not extensively explored because of their vast number and technical barriers to obtaining them in pure form. Computational approaches can expedite exploration of natural compounds and their derivatives at a much larger scale. Towards this, we took advantage of the known protein targets of drugs to mine natural compounds with similarity to known small-molecule drugs. The underlying hypothesis is that two compounds binding to the same protein target are similar from a bioactivity viewpoint. To identify high-dimensional structural features of the compounds underlying their bioactivity, we computed various structural features of paired drugs (i.e., drugs sharing a common protein target) and used these to train machine learning classifiers. The trained classification models were then used to predict similarity between drugs and natural compounds. We assessed the resulting predictions-protein target binding by natural compounds-through an extensive literature survey, and experimental validated a novel prediction. Together, our results outline a workflow and provide a resource to explore therapeutic potential of natural compounds.
展开▼