首页> 外文期刊>Bioinformatics >tmVar: a text mining approach for extracting sequence variants in biomedical literature
【24h】

tmVar: a text mining approach for extracting sequence variants in biomedical literature

机译:tmVar:一种文本挖掘方法,用于提取生物医学文献中的序列变体

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Text-mining mutation information from the literature becomes a critical part of the bioinformatics approach for the analysis and interpretation of sequence variations in complex diseases in the post-genomic era. It has also been used for assisting the creation of disease-related mutation databases. Most of existing approaches are rule-based and focus on limited types of sequence variations, such as protein point mutations. Thus, extending their extraction scope requires significant manualefforts in examining new instances and developing corresponding rules. As such, new automatic approaches are greatly needed for extracting different kinds of mutations with high accuracy. Results: Here, we report tmVar, a text-mining approach based on conditional random field (CRF) for extracting a wide range of sequence variants described at protein, DNA and RNA levels according to a standard nomenclature developed by the Human Genome Variation Society. By doing so, we cover several important types ofmutations that were not considered in past studies. Using a novel CRF label model and feature set, our method achieves higher performance than a state-of-the-art method on both our corpus (91.4 versus 78.1% in F-measure) and their own gold standard (93.9 versus 89.4% in F-measure). These results suggest that tmVar is a high-per-lormance method for mutation extraction from biomedical literature. Availability: tmVar software and its corpus of 500 manually curated abstracts are available for download at h
机译:动机:从文献中挖掘文本突变信息成为生物信息学方法的重要组成部分,用于分析和解释后基因组时代复杂疾病中的序列变异。它也已用于协助创建与疾病相关的突变数据库。现有的大多数方法都是基于规则的,专注于有限类型的序列变异,例如蛋白质点突变。因此,扩展其提取范围要求在检查新实例和制定相应规则方面进行大量的人工工作。因此,非常需要新的自动方法来高精度地提取不同种类的突变。结果:在这里,我们报道了tmVar,这是一种基于条件随机场(CRF)的文本挖掘方法,用于根据人类基因组变异学会开发的标准命名法,提取在蛋白质,DNA和RNA水平上描述的多种序列变体。通过这样做,我们涵盖了过去研究中未考虑的几种重要类型的变异。使用最新的CRF标签模型和功能集,我们的方法在我们的语料库(F量度中为91.4%,而在F-measure中为78.1%)和自己的金标准(在F-measure中,为93.9%,而在F-measure中为89.4%)上均达到了比最新技术更高的性能。 F-措施)。这些结果表明,tmVar是从生物医学文献中提取突变的高效方法。可用性:可从h下载tmVar软件及其500个手动策划的摘要文集

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号