首页> 外文期刊>BMC Bioinformatics >HypertenGene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features
【24h】

HypertenGene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features

机译:HypertenGene:从具有位置和自动生成的模板特征的生物医学文献中提取关键的高血压基因

获取原文
           

摘要

Background The genetic factors leading to hypertension have been extensively studied, and large numbers of research papers have been published on the subject. One of hypertension researchers' primary research tasks is to locate key hypertension-related genes in s. However, gathering such information with existing tools is not easy: (1) Searching for articles often returns far too many hits to browse through. (2) The search results do not highlight the hypertension-related genes discovered in the . (3) Even though some text mining services mark up gene names in the , the key genes investigated in a paper are still not distinguished from other genes. To facilitate the information gathering process for hypertension researchers, one solution would be to extract the key hypertension-related genes in each . Three major tasks are involved in the construction of this system: (1) gene and hypertension named entity recognition, (2) section categorization, and (3) gene-hypertension relation extraction. Results We first compare the retrieval performance achieved by individually adding template features and position features to the baseline system. Then, the combination of both is examined. We found that using position features can almost double the original AUC score (0.8140vs.0.4936) of the baseline system. However, adding template features only results in marginal improvement (0.0197). Including both improves AUC to 0.8184, indicating that these two sets of features are complementary, and do not have overlapping effects. We then examine the performance in a different domain--diabetes, and the result shows a satisfactory AUC of 0.83. Conclusion Our approach successfully exploits template features to recognize true hypertension-related gene mentions and position features to distinguish key genes from other related genes. Templates are automatically generated and checked by biologists to minimize labor costs. Our approach integrates the advantages of machine learning models and pattern matching. To the best of our knowledge, this the first systematic study of extracting hypertension-related genes and the first attempt to create a hypertension-gene relation corpus based on the GAD database. Furthermore, our paper proposes and tests novel features for extracting key hypertension genes, such as relative position, section, and template features, which could also be applied to key-gene extraction for other diseases.
机译:背景技术已经广泛研究了导致高血压的遗传因素,并且已经发表了大量关于该主题的研究论文。高血压研究人员的主要研究任务之一是找到s中与高血压相关的关键基因。但是,使用现有工具收集此类信息并不容易:(1)搜索文章通常会返回太多的命中内容,无法浏览。 (2)搜索结果未突出显示在中发现的与高血压相关的基因。 (3)尽管某些文本挖掘服务在中标记了基因名称,但论文中研究的关键基因仍无法与其他基因区分开。为了促进高血压研究人员的信息收集过程,一种解决方案是在每个研究人员中提取与高血压相关的关键基因。该系统的构建涉及三个主要任务:(1)基因和高血压,称为实体识别;(2)切片分类;以及(3)基因-高血压关系提取。结果我们首先比较通过将模板特征和位置特征单独添加到基线系统获得的检索性能。然后,检查两者的组合。我们发现使用位置特征几乎可以使基线系统的原始AUC分数(0.8140vs.0.4936)几乎翻倍。但是,添加模板功能只会导致边际改进(0.0197)。包括这两项将AUC提高到0.8184,表明这两组功能是互补的,并且没有重叠的作用。然后,我们检查了糖尿病在不同领域的表现,结果显示令人满意的AUC为0.83。结论我们的方法成功利用模板特征来识别真正的高血压相关基因提及,并利用位置特征将关键基因与其他相关基因区分开。模板是由生物学家自动生成和检查的,以最大程度地降低人工成本。我们的方法整合了机器学习模型和模式匹配的优势。据我们所知,这是首次提取高血压相关基因的系统研究,并且是首次尝试根据GAD数据库创建高血压与基因相关的语料库。此外,本文提出并测试了用于提取关键高血压基因的新特征,例如相对位置,截面和模板特征,这些特征也可用于其他疾病的关键基因提取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号