首页> 外文会议>11th Workshop on biomedical natural language processing 2012 >Classifying Gene Sentences in Biomedical Literature by Combining High-Precision Gene Identifiers
【24h】

Classifying Gene Sentences in Biomedical Literature by Combining High-Precision Gene Identifiers

机译:通过组合高精度基因标识符对生物医学文献中的基因句子进行分类

获取原文
获取原文并翻译 | 示例

摘要

Gene name identification is a fundamental step to solve more complicated text mining problems such as gene normalization and protein-protein interactions. However, state-of-the-art name identification methods are not yet sufficient for use in a fully automated system. In this regard, a relaxed task, gene/protein sentence identification, may serve more effectively for manually searching and browsing biomedical literature. In this paper, we set up a new task, gene/protein sentence classification and propose an ensemble approach for addressing this problem. Well-known named entity tools use similar gold-standard sets for training and testing, which results in relatively poor performance for unknown sets. We here explore how to combine diverse high-precision gene identifiers for more robust performance. The experimental results show that the proposed approach outperforms BANNER as a stand-alone classifier for newly annotated sets as well as previous gold-standard sets.
机译:基因名称识别是解决更复杂的文本挖掘问题(如基因标准化和蛋白质-蛋白质相互作用)的基本步骤。但是,最新的名称识别方法还不足以在全自动系统中使用。在这方面,轻松的任务,即基因/蛋白质句子识别,可以更有效地用于手动搜索和浏览生物医学文献。在本文中,我们建立了一个新的任务,即基因/蛋白质句子分类,并提出了一种整体方法来解决此问题。著名的命名实体工具使用相似的金标准集进行训练和测试,这导致未知集的性能相对较差。我们在这里探索如何结合各种高精度基因标识符以获得更强大的性能。实验结果表明,对于新注释的集合以及以前的金标准集合,该方法优于BANNER作为独立的分类器。

著录项

  • 来源
  • 会议地点 Montreal(CA)
  • 作者单位

    National Center for Biotechnology Information National Library of Medicine, National Institutes of Health Bethesda, MD 20894, USA;

    National Center for Biotechnology Information National Library of Medicine, National Institutes of Health Bethesda, MD 20894, USA;

    National Center for Biotechnology Information National Library of Medicine, National Institutes of Health Bethesda, MD 20894, USA;

    National Center for Biotechnology Information National Library of Medicine, National Institutes of Health Bethesda, MD 20894, USA;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号