首页> 美国卫生研究院文献>The ISME Journal >Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences
【2h】

Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences

机译:最小熵分解:用于高通量标记基因序列敏感分区的无监督寡聚

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Molecular microbial ecology investigations often employ large marker gene datasets, for example, ribosomal RNAs, to represent the occurrence of single-cell genomes in microbial communities. Massively parallel DNA sequencing technologies enable extensive surveys of marker gene libraries that sometimes include nearly identical sequences. Computational approaches that rely on pairwise sequence alignments for similarity assessment and de novo clustering with de facto similarity thresholds to partition high-throughput sequencing datasets constrain fine-scale resolution descriptions of microbial communities. Minimum Entropy Decomposition (MED) provides a computationally efficient means to partition marker gene datasets into ‘MED nodes', which represent homogeneous operational taxonomic units. By employing Shannon entropy, MED uses only the information-rich nucleotide positions across reads and iteratively partitions large datasets while omitting stochastic variation. When applied to analyses of microbiomes from two deep-sea cryptic sponges Hexadella dedritifera and Hexadella cf. dedritifera, MED resolved a key Gammaproteobacteria cluster into multiple MED nodes that are specific to different sponges, and revealed that these closely related sympatric sponge species maintain distinct microbial communities. MED analysis of a previously published human oral microbiome dataset also revealed that taxa separated by less than 1% sequence variation distributed to distinct niches in the oral cavity. The information theory-guided decomposition process behind the MED algorithm enables sensitive discrimination of closely related organisms in marker gene amplicon datasets without relying on extensive computational heuristics and user supervision.
机译:分子微生物生态学调查通常采用大型标记基因数据集,例如核糖体RNA,来代表微生物群落中单细胞基因组的出现。大规模并行的DNA测序技术可对标记基因库进行广泛的调查,有时甚至包含几乎相同的序列。依靠成对序列比对进行相似性评估的计算方法,以及使用具有实际相似性阈值的从头聚类划分高通量测序数据集的方法,限制了微生物群落的精细分辨率描述。最小熵分解(MED)提供了一种计算有效的方法,可以将标记基因数据集划分为“ MED节点”,该节点代表同质的操作分类单位。通过使用Shannon熵,MED仅使用读取中信息丰富的核苷酸位置,并在不考虑随机变化的情况下迭代分区大型数据集。当用于分析来自两个深海隐性海绵的微生物群落时,Hexadella dedritifera和Hexadellacf。 MED的pedritifera中,将关键的丙种细菌群分解为多个特定于不同海绵的MED节点,并揭示了这些密切相关的同胞海绵物种保持着独特的微生物群落。 MED对先前发布的人类口腔微生物组数据集的分析还显示,分类单元以不到1%的序列变异分开分布到口腔中的不同壁ni。 MED算法背后的以信息论为指导的分解过程可对标记基因扩增子数据集中的密切相关生物进行敏感区分,而无需依赖大量的计算试探法和用户监督。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号