首页> 外文期刊>BMC Bioinformatics >OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature
【24h】

OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature

机译:OSIRISv1.2:用于生物医学文献中基因序列变异的命名实体识别系统

获取原文
       

摘要

Background Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. Results Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes http://ibi.imim.es/osirisform.html . Here we describe the development of a new version of OSIRIS (OSIRISv1.2, http://ibi.imim.es/OSIRISv1.2.html ) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented. Conclusion OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases.
机译:背景技术除了其他类型的序列变体之外,单核苷酸多态性是遗传流行病学和药物基因组学的关键要素。尽管可以在诸如dbSNP之类的数据库中找到有关遗传变异的序列数据,但是有关变异的功能和表型后果的线索通常可以在生物医学文献中找到。有关文献的识别和从中提取信息受到文献数据库规模大和缺乏生物医学实体标准注释的阻碍。因此,需要用于识别生物医学文献中基因的等位基因变体的引用的自动系统。结果我们的研究小组先前曾报道过OSIRIS的开发,该系统旨在检索有关基因等位基因变体的文献,网址为http://ibi.imim.es/osirisform.html。在这里,我们描述了OSIRIS的新版本(OSIRISv1.2,http://ibi.imim.es/OSIRISv1.2.html)的开发,该版本合并了新的实体识别模块,并建立在OSIRIS的本地镜像之上。 MEDLINE收集和HgenetInfoDB:一个收集有关人类基因序列变异数据的数据库。新的实体识别模块基于基于模式的搜索算法,用于识别文本中的变化项并将其映射到dbSNP标识符。 OSIRISv1.2的性能在手动注释的语料库上进行了评估,从而获得了99%的精度,82%的查全率和0.89的F评分。例如,介绍了该系统用于收集与颅内动脉瘤和乳腺癌相关疾病的基因等位基因变异的文献引文的应用。结论OSIRISv1.2可用于高精度地将文献参考链接到dbSNP数据库条目,因此适合用于收集有关基因序列变异的当前知识并支持变异数据库的功能注释。 OSIRISv1.2与诸如MeSH的受控词汇结合使用,提供了一种方法来识别生物医学关注的关联,例如那些将SNP与疾病相关的关联。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号