首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >Gene ontology concept recognition using named concept: understanding the various presentations of the gene functions in biomedical literature
【2h】

Gene ontology concept recognition using named concept: understanding the various presentations of the gene functions in biomedical literature

机译:使用命名概念的基因本体概念识别:了解生物医学文献中基因功能的各种表现形式

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Objective: A major challenge in precision medicine is the development of patient-specific genetic biomarkers or drug targets. The firsthand information of the genes associated with the pathologic pathways of interest is buried in the ocean of biomedical literature. Gene ontology concept recognition (GOCR) is a biomedical natural language processing task used to extract and normalize the mentions of gene ontology (GO), the controlled vocabulary for gene functions across many species, from biomedical text. The previous GOCR systems, using either rule-based or machine-learning methods, treated GO concepts as separate terms and did not have an efficient way of sharing the common synonyms among the concepts. >Materials and Methods: We used the CRAFT corpus in this study. Targeting the compositional structure of the GO, we introduced named concept, the basic conceptual unit which has a conserved name and is used in other complex concepts. Using the named concepts, we separated the GOCR task into dictionary-matching and machine-learning steps. By harvesting the surface names used in the training data, we wildly boosted the synonyms of GO concepts via the connection of the named concepts and then enhanced the capability to recognize more GO concepts in the text. The source code is available at . >Results: Named concept gene ontology concept recognizer (NCGOCR) achieved 0.804 precision and 0.715 recall by correct recognition of the non-standard mentions of the GO concepts. >Discussion: The lack of consensus on GO naming causes diversity in the GO mentions in biomedical manuscripts. The high performance is owed to the stability of the composing GO concepts and the lack of variance in the spelling of named concepts. >Conclusion: NCGOCR reduced the arduous work of GO annotation and amended the process of searching for the biomarkers or drug targets, leading to improved biomarker development and greater success in precision medicine.
机译:目的:精准医学的主要挑战是开发患者特异性的遗传生物标记或药物靶标。与感兴趣的病理途径相关的基因的第一手资料被埋在生物医学文献的海洋中。基因本体概念识别(GOCR)是一种生物医学自然语言处理任务,用于从生物医学文本中提取和标准化基因本体论(GO)的提述,而基因本体论是许多物种中基因功能的受控词汇。以前的GOCR系统使用基于规则的方法或机器学习方法,将GO概念视为单独的术语,并且没有在概念之间共享通用同义词的有效方法。 >材料和方法:我们在这项研究中使用了CRAFT语料库。针对GO的组成结构,我们介绍了命名的概念,这是一个基本的概念单元,具有保守的名称,并用于其他复杂的概念中。使用命名的概念,我们将GOCR任务分为字典匹配和机器学习步骤。通过收集训练数据中使用的表面名称,我们通过命名概念的连接疯狂地增强了GO概念的同义词,然后增强了在文本中识别更多GO概念的能力。源代码可从访问。 >结果:通过正确识别GO概念的非标准提及,命名概念基因本体概念识别器(NCGOCR)达到了0.804的精度和0.715的召回率。 >讨论:对GO命名缺乏共识导致生物医​​学手稿中GO提及的多样性。高性能归因于组成的GO概念的稳定性以及命名概念的拼写缺乏差异。 >结论: NCGOCR减少了GO注释的繁重工作,并修改了搜索生物标志物或药物靶标的过程,从而改善了生物标志物的开发并在精密医学领域取得了更大的成功。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号