首页> 外文期刊>International journal of data mining and bioinformatics >Improving named entity recognition accuracy for gene and protein in biomedical text literature
【24h】

Improving named entity recognition accuracy for gene and protein in biomedical text literature

机译:提高生物医学文献文献中基因和蛋白质的命名实体识别准确性

获取原文
获取原文并翻译 | 示例
       

摘要

The task of recognising biomedical named entities in natural language documents called biomedical Named Entity Recognition (NER) is the focus of many researchers due to complex nature of such texts. This complexity includes the issues of character-level, word-level and word order variations. In this study, an approach for recognising gene and protein names that handles the above issues is proposed. Similar to the previous related works, our approach is based on the assumption that a named entity occurs within a noun group. The strength of our proposed approach lies on a Statistical Character-based Syntax Similarity (SCSS) algorithm which measures similarity between the extracted candidates and the well-known biomedical named entities from the GENIA V3.0 corpus. The proposed approach is evaluated and results are satisfied. For recognitions of both gene and protein names, we achieved 97.2% for precision (P), 95.2% for recall (R), and 96.1 for F-measure. While for protein names recognition we gained 98.1% for P, 97.5% for R and 97.7 for F-measure.
机译:由于此类文本的复杂性,在自然语言文档中称为生物医学命名实体识别(NER)的生物医学命名实体的识别任务是许多研究人员关注的焦点。这种复杂性包括字符级别,单词级别和单词顺序变化的问题。在这项研究中,提出了一种解决上述问题的识别基因和蛋白质名称的方法。与先前的相关作品类似,我们的方法基于一个假设,即一个命名实体出现在一个名词组中。我们提出的方法的优势在于基于统计字符的语法相似性(SCSS)算法,该算法可测量提取的候选对象与GENIA V3.0语料库中的知名生物医学命名实体之间的相似性。对提出的方法进行了评估,结果令人满意。对于基因和蛋白质名称的识别,我们的精度(P)达到97.2%,召回率(R)达到95.2%,F度量达到96.1。蛋白质名称识别的P值提高了98.1%,R值提高了97.5%,F值提高了97.7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号