首页> 外文期刊>SAR and QSAR in Environmental Research >Integrating background knowledge from internet databases into predictive toxicology models
【24h】

Integrating background knowledge from internet databases into predictive toxicology models

机译:将互联网数据库的背景知识整合到预测毒理学模型中

获取原文
获取原文并翻译 | 示例
           

摘要

While data integration for data analysis has been investigated extensively in biological applications, it has not yet been so much the focus in computational chemistry and quantitative structure-activity relationship (QSAR) research. With the availability and growing number of chemical databases on the web, such data integration efforts become an intriguing possibility (and, in fact, a necessity). In this paper, we take a first step towards the following vision and scenario for predictive toxicology applications. Given a new structure to be predicted, the first step would be to gather (integrate) all relevant information from internet databases for the structure itself, and all structures with available information for the endpoint of interest. In a second step, the collected information is combined statistically into a prediction of the new structure. We simulate this scenario with three endpoints (data sets) from the DSSTox database and collect information from three public chemical databases: PubChem, ChemBank and Sigma-Aldrich. In the experiments, we investigate whether the addition of background knowledge from the three databases can improve predictive performance (over using chemical structure alone) in a statistically significant way. For this purpose, we define groups of features (belonging together from an application point of view) from the three databases, and perform a variant of forward selection to include these feature groups in a prediction model. Our experiments show that the integration of background knowledge from internet databases can significantly improve prediction performance, especially for regression tasks.View full textDownload full textKeywords(Q)SAR, data integration, internet databases, cheminformatics, machine learningRelated var addthis_config = { ui_cobrand: "Taylor & Francis Online", services_compact: "citeulike,netvibes,twitter,technorati,delicious,linkedin,facebook,stumbleupon,digg,google,more", pubid: "ra-4dff56cd6bb1830b" }; Add to shortlist Link Permalink http://dx.doi.org/10.1080/10629360903560579
机译:尽管已经在生物学应用中对用于数据分析的数据集成进行了广泛的研究,但在计算化学和定量构效关系(QSAR)研究中,它尚未成为研究的重点。随着网络上化学数据库的可用性和数量的增加,这种数据集成工作成为一种有趣的可能性(实际上是必要的)。在本文中,我们朝着预测毒理学应用的以下愿景和场景迈出了第一步。给定要预测的新结构,第一步将是从Internet数据库收集(集成)该结构本身的所有相关信息,并为感兴趣的端点收集所有具有可用信息的结构。在第二步中,将收集到的信息统计合并到新结构的预测中。我们使用DSSTox数据库的三个端点(数据集)来模拟这种情况,并从三个公共化学数据库(PubChem,ChemBank和Sigma-Aldrich)收集信息。在实验中,我们调查了从三个数据库中添加背景知识是否可以以统计学上显着的方式改善预测性能(仅使用化学结构)。为此,我们从三个数据库中定义了特征组(从应用程序的角度来看属于它们),并执行了前向选择的变体以将这些特征组包括在预测模型中。我们的实验表明,从互联网数据库中集成背景知识可以显着提高预测性能,尤其是对于回归任务。查看全文下载全文关键字(Q)SAR,数据集成,互联网数据库,化学信息学,机器学习相关var addthis_config = {ui_cobrand:“泰勒和弗朗西斯在线”,services_compact:“ citeulike,netvibes,twitter,technorati,delicious,linkedin,facebook,stumbleupon,digg,google,更多”,发布号:“ ra-4dff56cd6bb1830b”};添加到候选列表链接永久链接http://dx.doi.org/10.1080/10629360903560579

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号