Bioactivity assessment of natural compounds using machine learning models trained on target similarity between drugs

Periwal Vinita; Bassler Stefan; Andrejev SergejGabrielli NataliaPatil Kaustubh RaosahebTypas AthanasiosPatil Kiran Raosaheb

摘要

Natural compounds constitute a rich resource of potential small molecule therapeutics. While experimental access to this resource is limited due to its vast diversity and difficulties in systematic purification, computational assessment of structural similarity with known therapeutic molecules offers a scalable approach. Here, we assessed functional similarity between natural compounds and approved drugs by combining multiple chemical similarity metrics and physicochemical properties using a machine-learning approach. We computed pairwise similarities between 1410 drugs for training classification models and used the drugs shared protein targets as class labels. The best performing models were random forest which gave an average area under the ROC of 0.9, Matthews correlation coefficient of 0.35, and F1 score of 0.33, suggesting that it captured the structure-activity relation well. The models were then used to predict protein targets of circa 11k natural compounds by comparing them with the drugs. This revealed therapeutic potential of several natural compounds, including those with support from previously published sources as well as those hitherto unexplored. We experimentally validated one of the predicted pair's activities, viz., Cox-1 inhibition by 5-methoxysalicylic acid, a molecule commonly found in tea, herbs and spices. In contrast, another natural compound, 4-isopropylbenzoic acid, with the highest similarity score when considering most weighted similarity metric but not picked by our models, did not inhibit Cox-1. Our results demonstrate the utility of a machine-learning approach combining multiple chemical features for uncovering protein binding potential of natural compounds. Author summaryA large fraction of small-molecule drugs has originated from natural compounds making them an attractive resource for search of potential lead compounds. Yet, this resource is not extensively explored because of their vast number and technical barriers to obtaining them in pure form. Computational approaches can expedite exploration of natural compounds and their derivatives at a much larger scale. Towards this, we took advantage of the known protein targets of drugs to mine natural compounds with similarity to known small-molecule drugs. The underlying hypothesis is that two compounds binding to the same protein target are similar from a bioactivity viewpoint. To identify high-dimensional structural features of the compounds underlying their bioactivity, we computed various structural features of paired drugs (i.e., drugs sharing a common protein target) and used these to train machine learning classifiers. The trained classification models were then used to predict similarity between drugs and natural compounds. We assessed the resulting predictions-protein target binding by natural compounds-through an extensive literature survey, and experimental validated a novel prediction. Together, our results outline a workflow and provide a resource to explore therapeutic potential of natural compounds.

机译：天然化合物是潜在小分子疗法的丰富资源。虽然由于其巨大的多样性和系统纯化的困难，对该资源的实验访问受到限制，但与已知治疗分子的结构相似性的计算评估提供了一种可扩展的方法。在这里，我们通过使用机器学习方法结合多种化学相似性指标和理化性质来评估天然化合物和已批准药物之间的功能相似性。我们计算了 1410 种药物之间的成对相似性以训练分类模型，并使用药物共享的蛋白质靶标作为类别标签。表现最好的模型是随机森林，其ROC下的平均面积为0.9，Matthews相关系数为0.35，F1得分为0.33，表明它很好地捕捉了构效关系。然后，这些模型通过与药物进行比较来预测大约11k天然化合物的蛋白质靶标。这揭示了几种天然化合物的治疗潜力，包括那些得到以前发表的来源支持的化合物以及迄今为止尚未探索的化合物。我们通过实验验证了预测的一对活性之一，即 5-甲氧基水杨酸（一种常见于茶、草药和香料中的分子）对 Cox-1 的抑制作用。相比之下，另一种天然化合物4-异丙基苯甲酸，在考虑最加权的相似性指标时具有最高的相似性得分，但我们的模型没有选择，并没有抑制Cox-1。我们的研究结果表明，结合多种化学特征的机器学习方法在揭示天然化合物的蛋白质结合潜力方面具有实用性。作者摘要很大一部分小分子药物来自天然化合物，使其成为寻找潜在先导化合物的有吸引力的资源。然而，这种资源并未得到广泛探索，因为它们的数量众多且以纯形式获得它们的技术障碍。计算方法可以加快对天然化合物及其衍生物的更大规模的探索。为此，我们利用药物的已知蛋白质靶点来挖掘与已知小分子药物相似的天然化合物。基本假设是，从生物活性的角度来看，与同一蛋白质靶标结合的两种化合物是相似的。为了确定化合物生物活性的高维结构特征，我们计算了配对药物（即共享共同蛋白质靶标的药物）的各种结构特征，并使用这些特征来训练机器学习分类器。然后使用经过训练的分类模型来预测药物和天然化合物之间的相似性。我们通过广泛的文献调查评估了由此产生的预测 - 天然化合物的蛋白质靶标结合，并通过实验验证了一个新的预测。总之，我们的研究结果勾勒出一个工作流程，并为探索天然化合物的治疗潜力提供了资源。

Bioactivity assessment of natural compounds using machine learning models trained on target similarity between drugs

摘要

著录项

引文网络

相关主题

期刊订阅