首页> 外文会议> >Identifying gene and protein names from biological texts
【24h】

Identifying gene and protein names from biological texts

机译:从生物学文本中识别基因和蛋白质名称

获取原文

摘要

Extracting and identifying gene and protein names from literature is a critical step for mining functional information of genes and proteins. While extensive efforts have been devoted to this important task, most of them were aiming at extracting gene/protein name per se without paying much attention to associate the extracted name with existing gene and protein database entries. We developed a simple and efficient method to identify gene and protein names in literature using a combination of heuristic and statistical strategies. Our approach will map the extracted names to individual LocusLink entries thus enable the seamless integration of literature information with existing gene/protein databases. Evaluation on a test corpus shows that our method can achieve both high recall and precision. Our method exhibits good performance and can be used as a building block for large biomedical literature mining systems.
机译:从文献中提取和鉴定基因和蛋白质名称是用于挖掘基因和蛋白质功能信息的关键步骤。虽然广泛的努力已经致力于这一重要任务,但大多数旨在提取本身的基因/蛋白名称,而不会注意与现有的基因和蛋白质数据库条目相关联。我们开发了一种简单而有效的方法,可以使用启发式和统计策略的组合鉴定文献中的基因和蛋白质名称。我们的方法将提取的名称映射到各个轨迹链接条目,因此能够与现有基因/蛋白质数据库的文献信息无缝集成。测试语料库的评估表明,我们的方法可以实现高召回和精度。我们的方法表现出良好的性能,可用作大型生物医学文献采矿系统的构建块。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号