Improving named entity recognition accuracy for gene and protein in biomedical text literature

Tohidi Hossein; Ibrahim Hamidah; Murad Masrah Azrifah Azmi

首页> 外文期刊>International journal of data mining and bioinformatics >Improving named entity recognition accuracy for gene and protein in biomedical text literature

【24h】

Improving named entity recognition accuracy for gene and protein in biomedical text literature

机译：提高生物医学文献文献中基因和蛋白质的命名实体识别准确性

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The task of recognising biomedical named entities in natural language documents called biomedical Named Entity Recognition (NER) is the focus of many researchers due to complex nature of such texts. This complexity includes the issues of character-level, word-level and word order variations. In this study, an approach for recognising gene and protein names that handles the above issues is proposed. Similar to the previous related works, our approach is based on the assumption that a named entity occurs within a noun group. The strength of our proposed approach lies on a Statistical Character-based Syntax Similarity (SCSS) algorithm which measures similarity between the extracted candidates and the well-known biomedical named entities from the GENIA V3.0 corpus. The proposed approach is evaluated and results are satisfied. For recognitions of both gene and protein names, we achieved 97.2% for precision (P), 95.2% for recall (R), and 96.1 for F-measure. While for protein names recognition we gained 98.1% for P, 97.5% for R and 97.7 for F-measure.

机译：由于此类文本的复杂性，在自然语言文档中称为生物医学命名实体识别（NER）的生物医学命名实体的识别任务是许多研究人员关注的焦点。这种复杂性包括字符级别，单词级别和单词顺序变化的问题。在这项研究中，提出了一种解决上述问题的识别基因和蛋白质名称的方法。与先前的相关作品类似，我们的方法基于一个假设，即一个命名实体出现在一个名词组中。我们提出的方法的优势在于基于统计字符的语法相似性（SCSS）算法，该算法可测量提取的候选对象与GENIA V3.0语料库中的知名生物医学命名实体之间的相似性。对提出的方法进行了评估，结果令人满意。对于基因和蛋白质名称的识别，我们的精度（P）达到97.2％，召回率（R）达到95.2％，F度量达到96.1。蛋白质名称识别的P值提高了98.1％，R值提高了97.5％，F值提高了97.7％。

著录项

来源
《International journal of data mining and bioinformatics》 |2014年第3期|共30页
作者
Tohidi Hossein; Ibrahim Hamidah; Murad Masrah Azrifah Azmi;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
natural language processing; information extraction; NER; named entity recognition; biomedical;

机译：自然语言处理信息提取NER实体识别生物医学;

相似文献

外文文献
中文文献
专利

1. Improving named entity recognition accuracy for gene and protein in biomedical text literature [J] . Tohidi Hossein, Ibrahim Hamidah, Murad Masrah Azrifah Azmi International journal of data mining and bioinformatics . 2014,第3期

机译：提高生物医学文献文献中基因和蛋白质的命名实体识别准确性
2. OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature [J] . Laura I Furlong, Holger Dach, Martin Hofmann-Apitius, BMC Bioinformatics . 2008,第1期

机译：OSIRISv1.2：用于生物医学文献中基因序列变异的命名实体识别系统
3. Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes [J] . Huiwei Zhou, Shixian Ning, Zhe Liu, BMC Bioinformatics . 2020,第1期

机译：知识增强的生物医学命名实体识别和归一化：施用蛋白质和基因
4. Knowledge-Based Approach for Named Entity Recognition in Biomedical Literature: A Use Case in Biomedical Software Identification [C] . Muhammad Amith, Yaoyun Zhang, Hua Xu, International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems . 2017

机译：基于知识的生物医学文献中命名实体识别方法：生物医学软件识别中的用例
5. Unsupervised Biomedical Named Entity Recognition [D] . Ghiasvand, Omid. 2017

机译：无监督的生物医学命名实体识别
6. OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature [O] . Laura I Furlong, Holger Dach, Martin Hofmann-Apitius, 2008

机译：OSIRISv1.2：用于生物医学文献中基因序列变异的命名实体识别系统
7. Improving named entity recognition accuracy for gene and protein in biomedical text literature [O] . Hossein Tohidi, Hamidah Ibrahim, Masrah Azrifah Azmi Murad 2014

机译：改善生物医学文本文学中基因和蛋白质的命名实体识别准确性

Improving named entity recognition accuracy for gene and protein in biomedical text literature

摘要

著录项

相似文献

相关主题

期刊订阅