首页> 外文会议>16th workshop on biomedical natural language processing >Creation and evaluation of a dictionary-based tagger for virus species and proteins
【24h】

Creation and evaluation of a dictionary-based tagger for virus species and proteins

机译:创建和评估基于字典的病毒和蛋白质标记器

获取原文
获取原文并翻译 | 示例

摘要

Text mining automatically extracts information from the literature with the goal of making it available for further analysis, for example by incorporating it into biomedical databases. A key first step towards this goal is to identify and normalize the named entities, such as proteins and species, which are mentioned in text. Despite the large detrimental impact that viruses have on human and agricultural health, very little previous text-mining work has focused on identifying virus species and proteins in the literature. Here, we present an improved dictionary-based system for viral species and the first dictionary for viral proteins, which we benchmark on a new corpus of 300 manually annotated abstracts. We achieve 81.0% precision and 72.7% recall at the task of recognizing and normalizing viral species and 76.2% precision and 34.9% recall on viral proteins. These results are achieved despite the many challenges involved with the names of viral species and, especially, proteins. This work provides a foundation that can be used to extract more complicated relations about viruses from the literature.
机译:文本挖掘自动从文献中提取信息,以使其可用于进一步分析,例如将其合并到生物医学数据库中。朝着这个目标迈出的关键的第一步是识别并规范化文本中提到的命名实体,例如蛋白质和物种。尽管病毒对人类和农业健康产生了巨大的不利影响,但以前的文本挖掘工作很少关注文献中鉴定病毒的种类和蛋白质。在这里,我们介绍了一种针对病毒种类的改进的基于字典的系统,以及针对病毒蛋白的第一本字典,我们以此为基准对新的300个人工注释摘要进行了测试。在识别和标准化病毒种类的任务上,我们达到81.0%的准确度和72.7%的查全率,而对病毒蛋白的准确度为76.2%和34.9%的查全率。尽管病毒物种,尤其是蛋白质的名称涉及许多挑战,但仍获得了这些结果。这项工作提供了可用于从文献中提取有关病毒的更复杂关系的基础。

著录项

  • 来源
  • 会议地点 Vancouver(CA)
  • 作者单位

    Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark;

    Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark;

    Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark;

    TUM, Department of Informatics, Bioinformatics Computational Biology, il2, Boltzmannstr. 3, 85748 Garching/Munich, Germany;

    Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号