首页> 美国卫生研究院文献>other >Resolving Prokaryotic Taxonomy without rRNA: Longer Oligonucleotide Word Lengths Improve Genome and Metagenome Taxonomic Classification
【2h】

Resolving Prokaryotic Taxonomy without rRNA: Longer Oligonucleotide Word Lengths Improve Genome and Metagenome Taxonomic Classification

机译:解决不带rRNA的原核生物分类:更长的寡核苷酸单词长度可改善基因组和元基因组分类标准。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Oligonucleotide signatures, especially tetranucleotide signatures, have been used as method for homology binning by exploiting an organism’s inherent biases towards the use of specific oligonucleotide words. Tetranucleotide signatures have been especially useful in environmental metagenomics samples as many of these samples contain organisms from poorly classified phyla which cannot be easily identified using traditional homology methods, including NCBI BLAST. This study examines oligonucleotide signatures across 1,424 completed genomes from across the tree of life, substantially expanding upon previous work. A comprehensive analysis of mononucleotide through nonanucleotide word lengths suggests that longer word lengths substantially improve the classification of DNA fragments across a range of sizes of relevance to high throughput sequencing. We find that, at present, heptanucleotide signatures represent an optimal balance between prediction accuracy and computational time for resolving taxonomy using both genomic and metagenomic fragments. We directly compare the ability of tetranucleotide and heptanucleotide world lengths (tetranucleotide signatures are the current standard for oligonucleotide word usage analyses) for taxonomic binning of metagenome reads. We present evidence that heptanucleotide word lengths consistently provide more taxonomic resolving power, particularly in distinguishing between closely related organisms that are often present in metagenomic samples. This implies that longer oligonucleotide word lengths should replace tetranucleotide signatures for most analyses. Finally, we show that the application of longer word lengths to metagenomic datasets leads to more accurate taxonomic binning of DNA scaffolds and have the potential to substantially improve taxonomic assignment and assembly of metagenomic data.
机译:通过利用生物体对使用特定寡核苷酸词的固有偏见,寡核苷酸签名(尤其是四核苷酸签名)已被用作同源分类的方法。四核苷酸标记在环境宏基因组学样品中特别有用,因为这些样品中的许多都包含分类不佳的门的生物,使用传统的同源方法(包括NCBI BLAST)无法轻松识别。这项研究检查了整个生命树中1,424个完整基因组的寡核苷酸签名,这些标记在之前的工作中得到了显着扩展。通过非核苷酸字长对单核苷酸进行的全面分析表明,较长字长可在与高通量测序相关的各种大小范围内显着改善DNA片段的分类。我们发现,目前,七核苷酸签名代表了预测准确度和计算时间之间的最佳平衡,用于使用基因组片段和宏基因组片段来解决分类问题。我们直接比较了四核苷酸和七核苷酸世界长度(四核苷酸标记是寡核苷酸词用法分析的当前标准)进行元基因组读取分类分类的能力。我们提供的证据表明,七核苷酸的字长始终提供更大的分类学分辨能力,尤其是在区分宏基因组样本中经常存在的密切相关的生物之间。这意味着对于大多数分析,较长的寡核苷酸字长应取代四核苷酸标记。最后,我们证明了将更长的单词长度应用于宏基因组数据集会导致更精确的DNA支架分类学分箱,并具有极大地改善分类学分配和宏基因组数据组装的潜力。

著录项

  • 期刊名称 other
  • 作者

    Eric B. Alsop; Jason Raymond;

  • 作者单位
  • 年(卷),期 -1(8),7
  • 年度 -1
  • 页码 e67337
  • 总页数 11
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号