首页> 外文期刊>Bioinformatics >Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review
【24h】

Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review

机译:基于主动学习的全科学文章的信息结构分析和生物医学文献综述的两个应用

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Techniques that are capable of automatically analyzing the information structure of scientific articles could be highly useful for improving information access to biomedical literature. However, most existing approaches rely on supervisedmachine learning (ML) and substantial labeled data that are expensive to develop and apply to different sub-fields of biomedicine. Recent research shows that minimal supervision is sufficient for fairly accurate information structure analysis of biomedical abstracts. However, is it realistic for full articles given their high linguistic and informational complexity? We introduce and release a novel corpus of 50 biomedical articles annotated according to the Argumentative Zoning (AZ) scheme, and investigate active learning with one of the most widely used ML models— Support Vector Machines (SVM)—on this corpus. Additionally, we introduce two novel applications that use AZ to support real-life literature review in biomedicine via question answering andsummarization. Results: We show that active learning with SVM trained on 500 labeled sentences (6% of the corpus) performs surprisingly well with the accuracy of 82%, just 2% lower than fully supervised learning. In our question answering task, biomedical researchers find relevant information significantly faster from AZ-annotated than unannotated articles. In the summarization task, sentences extracted from particular zones are significantly more similar to gold standard summaries than those extractedfrom particular sections of full articles. These results demonstrate that active learning of full articles’ information structure is indeed realistic and the accuracy is high enough to support real-life literature review in biomedicine. Availability: The annotated corpus, our AZ classifier and the two novel applications are available at http://www.cl.cam.ac.uk/~yg244/ 12bioinfo.html.
机译:动机:能够自动分析科学文章信息结构的技术对于改善对生物医学文献的信息获取可能非常有用。但是,大多数现有方法依赖于监督式机器学习(ML)和大量标记数据,这些数据开发成本很高,并且难以应用于生物医学的不同子领域。最近的研究表明,最少的监督就足以对生物医学摘要进行相当准确的信息结构分析。但是,鉴于其全文在语言和信息方面的高度复杂性,对于整篇文章来说是否现实?我们引入并发布了一个新的语料库,该语料库包含50篇根据议事区划(AZ)方案注释的生物医学文章,并使用该语料库上使用最广泛的ML模型之一(支持向量机(SVM))研究主动学习。此外,我们介绍了两种新颖的应用程序,它们通过问题解答和摘要使用AZ支持生物医学中的现实生活文献综述。结果:我们显示,通过在500个带标签的句子(占主体的6%)上训练的SVM进行的主动学习具有令人惊讶的出色表现,其准确率为82%,仅比完全监督学习低2%。在我们的问答任务中,生物医学研究人员从带有AZ注释的文章中发现相关信息的速度明显快于无注释的文章。在摘要任务中,从特定区域中提取的句子比从完整文章的特定部分中提取的句子与金标准摘要更为相似。这些结果表明,主动学习全文的信息结构确实是现实的,其准确性足以支持生物医学中的现实生活中的文献综述。可用性:带注释的语料库,我们的AZ分类器和两个新颖的应用程序可从http://www.cl.cam.ac.uk/~yg244/ 12bioinfo.html获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号