...
首页> 外文期刊>BMC Bioinformatics >Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
【24h】

Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)

机译:对比较有毒毒性数据库(CTD)的化学基因疾病网络的文本挖掘和手工策序

获取原文
           

摘要

Background The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, published literature. The goals of the research reported here were to establish a baseline analysis of current CTD curation, develop a text-mining prototype from readily available open source components, and evaluate its potential value in augmenting curation efficiency and increasing data coverage. Results Prototype text-mining applications were developed and evaluated using a CTD data set consisting of manually curated molecular interactions and relationships from 1,600 documents. Preliminary results indicated that the prototype found 80% of the gene, chemical, and disease terms appearing in curated interactions. These terms were used to re-rank documents for curation, resulting in increases in mean average precision (63% for the baseline vs. 73% for a rule-based re-ranking), and in the correlation coefficient of rank vs. number of curatable interactions per document (baseline 0.14 vs. 0.38 for the rule-based re-ranking). Conclusion This text-mining project is unique in its integration of existing tools into a single workflow with direct application to CTD. We performed a baseline assessment of the inter-curator consistency and coverage in CTD, which allowed us to measure the potential of these integrated tools to improve prioritization of journal articles for manual curation. Our study presents a feasible and cost-effective approach for developing a text mining solution to enhance manual curation throughput and efficiency.
机译:背景技术比较毒源组合数据库(CTD)是一个公开的资源,促进了对环境疾病的病因的理解。它提供了手动愈合的化学基因/蛋白质相互作用和与同行评审,出版的文献的化学和基因疾病关系。报告的研究目标是建立了当前CTD策策的基线分析,从易于获得的开源组件开发了文本挖掘原型,并评估了增强策择效率和增加数据覆盖率的潜在价值。结果使用由手动策划的分子相互作用和来自1,600件文件的关系组成的CTD数据集进行了原型文本挖掘应用。初步结果表明原型发现了80%的基因,化学品和疾病术语出现在策级相互作用中。这些术语用于重新排名策级级别,导致平均平均精度的增加(基线的基线与73%的63%),以及秩Vs的相关系数。每个文档的裂缝交互(基于规则的重新排名的基线0.14与0.38)。结论该文本挖掘项目在将现有工具集成到单个工作流程中是独一无二的,直接应用于CTD。我们对CTD的校准间一致性和覆盖率进行了基线评估,这使我们能够测量这些综合工具的潜力,以改善手动策择杂志的优先级。我们的研究介绍了一种可行且经济高效的方法,用于开发文本挖掘解决方案,以提高手动策划吞吐量和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号