首页> 外文期刊>BMC Bioinformatics >A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment
【24h】

A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment

机译:癌症风险评估中文本信息结构模型的比较和基于用户的评估

获取原文
           

摘要

Background Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the results or conclusions of the study in question. Several schemes have been developed to characterize such information in scientific journal articles. For example, a simple section-based scheme assigns individual sentences in abstracts under sections such as Objective, Methods, Results and Conclusions. Some schemes of textual information structure have proved useful for biomedical text mining (BIO-TM) tasks (e.g. automatic summarization). However, user-centered evaluation in the context of real-life tasks has been lacking. Methods We take three schemes of different type and granularity - those based on section names, Argumentative Zones (AZ) and Core Scientific Concepts (CoreSC) - and evaluate their usefulness for a real-life task which focuses on biomedical abstracts: Cancer Risk Assessment (CRA). We annotate a corpus of CRA abstracts according to each scheme, develop classifiers for automatic identification of the schemes in abstracts, and evaluate both the manual and automatic classifications directly as well as in the context of CRA. Results Our results show that for each scheme, the majority of categories appear in abstracts, although two of the schemes (AZ and CoreSC) were developed originally for full journal articles. All the schemes can be identified in abstracts relatively reliably using machine learning. Moreover, when cancer risk assessors are presented with scheme annotated abstracts, they find relevant information significantly faster than when presented with unannotated abstracts, even when the annotations are produced using an automatic classifier. Interestingly, in this user-based evaluation the coarse-grained scheme based on section names proved nearly as useful for CRA as the finest-grained CoreSC scheme. Conclusions We have shown that existing schemes aimed at capturing information structure of scientific documents can be applied to biomedical abstracts and can be identified in them automatically with an accuracy which is high enough to benefit a real-life task in biomedicine.
机译:背景技术生物医学中的许多实际任务需要访问科学文献中的特定类型的信息。例如有关所研究的结果或结论的信息。已经开发了几种方案来表征科学期刊文章中的此类信息。例如,一个简单的基于节的方案在“目标”,“方法”,“结果”和“结论”等节下的摘要中分配单个句子。事实证明,某些文本信息结构方案可用于生物医学文本挖掘(BIO-TM)任务(例如,自动摘要)。但是,在现实生活中,缺乏以用户为中心的评估。方法我们采用了三种不同类型和粒度的方案-基于部分名称,议事区(AZ)和核心科学概念(CoreSC)的方案,并评估了它们对于以生物医学摘要为重点的现实生活任务的有用性:癌症风险评估( CRA)。我们根据每种方案对CRA摘要的语料进行注释,开发分类器以自动识别摘要中的方案,并在CRA的上下文中直接评估手动分类和自动分类。结果我们的结果表明,尽管其中两个方案(AZ和CoreSC)最初是为完整期刊文章开发的,但对于每个方案,大多数类别都以摘要形式出现。可以使用机器学习相对可靠地摘要识别所有方案。此外,当癌症风险评估者获得带有计划注释的摘要时,即使使用自动分类器生成注释,与找到未注释的摘要时相比,他们发现相关信息的速度也要快得多。有趣的是,在此基于用户的评估中,事实证明,基于节名称的粗粒度方案对于CRA几乎与最佳粒度的CoreSC方案一样有用。结论我们已经表明,现有的旨在捕获科学文献信息结构的方案可以应用于生物医学摘要,并且可以自动识别它们,其准确性足以使生物医学中的现实生活受益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号