首页> 美国卫生研究院文献>Bioinformatics >Annotating genes and genomes with DNA sequences extracted from biomedical articles
【2h】

Annotating genes and genomes with DNA sequences extracted from biomedical articles

机译:用从生物医学文章中提取的DNA序列注释基因和基因组

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: Increasing rates of publication and DNA sequencing make the problem of finding relevant articles for a particular gene or genomic region more challenging than ever. Existing text-mining approaches focus on finding gene names or identifiers in English text. These are often not unique and do not identify the exact genomic location of a study.>Results: Here, we report the results of a novel text-mining approach that extracts DNA sequences from biomedical articles and automatically maps them to genomic databases. We find that ∼20% of open access articles in PubMed central (PMC) have extractable DNA sequences that can be accurately mapped to the correct gene (91%) and genome (96%). We illustrate the utility of data extracted by text2genome from more than 150 000 PMC articles for the interpretation of ChIP-seq data and the design of quantitative reverse transcriptase (RT)-PCR experiments.>Conclusion: Our approach links articles to genes and organisms without relying on gene names or identifiers. It also produces genome annotation tracks of the biomedical literature, thereby allowing researchers to use the power of modern genome browsers to access and analyze publications in the context of genomic data.>Availability and implementation: Source code is available under a BSD license from and results can be browsed and downloaded at .>Contact: >Supplementary information: are available at Bioinformatics online.
机译:>动机::出版和DNA测序的速度不断提高,为特定基因或基因组区域寻找相关文章的问题比以往更具挑战性。现有的文本挖掘方法着重于在英文文本中查找基因名称或标识符。这些通常不是唯一的,并且无法识别研究的确切基因组位置。>结果:在这里,我们报告了一种新颖的文本挖掘方法的结果,该方法可从生物医学文章中提取DNA序列并自动进行图谱绘制基因组数据库。我们发现,在PubMed Central(PMC)中约有20%的开放获取文章具有可提取的DNA序列,这些序列可准确地定位到正确的基因(91%)和基因组(96%)。我们说明了text2genome从超过15万篇PMC文章中提取的数据在解释ChIP-seq数据和定量逆转录酶(RT)-PCR实验设计中的效用。>结论:无需依赖基因名称或标识符的基因和生物制品。它还可以生成生物医学文献的基因组注释轨迹,从而使研究人员可以利用现代基因组浏览器的功能在基因组数据的背景下访问和分析出版物。>可用性和实现:可以从以下位置浏览和下载BSD许可证及其结果。>联系方式: >补充信息:可从Bioinformatics在线获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号