首页> 美国卫生研究院文献>Journal of Cheminformatics >Leveraging heterogeneous data from GHS toxicity annotations molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity
【2h】

Leveraging heterogeneous data from GHS toxicity annotations molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity

机译:利用来自GHS毒性注释分子和蛋白质靶标描述符以及Tox21分析读数的异类数据预测和合理化急性毒性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Despite the increasing knowledge in both the chemical and biological domains the assimilation and exploration of heterogeneous datasets, encoding information about the chemical, bioactivity and phenotypic properties of compounds, remains a challenge due to requirement for overlap between chemicals assayed across the spaces. Here, we have constructed a novel dataset, larger than we have used in prior work, comprising 579 acute oral toxic compounds and 1427 non-toxic compounds derived from regulatory GHS information, along with their corresponding molecular and protein target descriptors and qHTS in vitro assay readouts from the Tox21 project. We found no clear association between the results of a FAFDrugs4 toxicophore screen and the acute oral toxicity classifications for our compound set; and a screen using a subset of the ToxAlerts toxicophores was also of limited utility, with only slight enrichment toward the toxic set (odds ratio of 1.48). We then investigated to what degree toxic and non-toxic compounds could be separated in each of the spaces, to compare their potential contribution to further analyses. Using an LDA projection, we found the largest degree of separation using chemical descriptors (Cohen’s d of 1.95) and the lowest degree of separation between toxicity classes using qHTS descriptors (Cohen’s d of 0.67). To compare the predictivity of the feature spaces for the toxicity endpoint, we next trained Random Forest (RF) acute oral toxicity classifiers on either molecular, protein target and qHTS descriptors. RFs trained on molecular and protein target descriptors were most predictive, with ROC AUC values of 0.80–0.92 and 0.70–0.85, respectively, across three test sets. RFs trained on both chemical and protein target descriptors combined exhibited similar predictive performance to the single-domain models (ROC AUC of 0.80–0.91). Model interpretability was improved by the inclusion of protein target descriptors, which allow the identification of specific targets (e.g. Retinal dehydrogenase) with literature links to toxic modes of action (e.g. oxidative stress). The dataset compiled in this study has been made available for future application.Electronic supplementary materialThe online version of this article (10.1186/s13321-019-0356-5) contains supplementary material, which is available to authorized users.
机译:尽管化学和生物学领域的知识不断增长,但由于需要跨空间分析的化学物质之间存在重叠,因此对有关化合物的化学,生物活性和表型性质的信息进行编码的异质数据集的同化和探索仍然是一项挑战。在这里,我们构建了一个新的数据集,其大小超过了以前的工作范围,它包含579种急性GHS信息衍生的口服急性有毒化合物和1427种无毒化合物,以及它们相应的分子和蛋白质靶标以及体外qHTS分析Tox21项目的读数。我们发现FAFDrugs4毒性载体筛选结果与我们的化合物组的急性口服毒性分类之间没有明确的关联。使用ToxAlerts毒性载体的子集进行筛选的效用也很有限,仅对毒性成分稍有富集(比值为1.48)。然后,我们研究了每个空间中有毒和无毒化合物的分离程度,以比较它们对进一步分析的潜在贡献。使用LDA预测,我们发现使用化学描述符的最大分离度(Cohen d为1.95)和使用qHTS描述符的毒性类别之间的最低分离度(Cohen d为0.67)。为了比较毒性终点特征空间的可预测性,我们接下来在分子,蛋白质靶标和qHTS描述符上训练了Random Forest(RF)急性口腔毒性分类器。在分子和蛋白质靶标描述符上训练的RF最具预测性,在三个测试集中,ROC AUC值分别为0.80-0.92和0.70-0.85。结合化学和蛋白质靶点描述符训练的RF表现出与单域模型相似的预测性能(ROC AUC为0.80–0.91)。通过包含蛋白质靶标描述符提高了模型的可解释性,该描述符可以通过与毒性作用方式(例如氧化应激)的文献链接来鉴定特定靶标(例如视网膜脱氢酶)。电子研究补充资料本文的在线版本(10.1186 / s13321-019-0356-5)包含补充资料,可供授权用户使用。

著录项

相似文献

  • 外文文献
  • 中文文献
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号