首页> 外文期刊>Methods: A Companion to Methods in Enzymology >Estimating the composition of species in metagenomes by clustering of next-generation read sequences
【24h】

Estimating the composition of species in metagenomes by clustering of next-generation read sequences

机译:通过聚类下一代阅读序列来估算元基因组中物种的组成

获取原文
获取原文并翻译 | 示例
           

摘要

Faster and cheaper sequencing technologies together with the ability to sequence uncultured microbes collected from any environment present us an opportunity to distill meaningful information from the millions of new genomic sequences from environmental samples, called metagenome. Contrary to conventional cultured microbes, however, the metagenomic data is extremely heterogeneous and noisy. Therefore the separation of the sets of sequenced genomic fragments that belong to different microbes is essential for successful assembly of microbial genomes. In this paper, we present a novel clustering method for a given metagenomic dataset. The metagenomic dataset has some distinguished features because (i) it is possible that similar sequence patterns may exist in different species and (ii) each species has different number of individuals in the given metagenomic dataset. Our method overcomes these obstacles by using the Gaussian mixture model and analysis of mixture profiles, and taking advantage of genomic signatures extracted from the metagenomic dataset. Unlike conventional clustering methods where clusters are discovered through global similarities of data instances, our method builds clusters by combining the data instances sharing local similarities captured by mixture analysis. By considering shared mixture components, our method is able to create clusters of genomic sequences although they are globally distinct each other. We applied our method to an artificial metagenonnic dataset comprised of simulated 47 million reads from 25 real microbial genomes, and analyzed the resulting clusters in terms of the number of clusters, the number of participating species and dominant species in each cluster. Even though our approach cannot address all challenges in the field of metagenome sequence clustering, we believe that out method can contribute to take a step forward to achieve the goals. (C) 2014 Elsevier Inc. All rights reserved.
机译:更快,更便宜的测序技术,以及对从任何环境中收集的未培养微生物进行测序的能力,为我们提供了从环境样本(称为元基因组)的数百万个新基因组序列中提取有意义的信息的机会。然而,与常规培养的微生物相反,宏基因组学数据非常不均一且嘈杂。因此,分离属于不同微生物的测序基因组片段对成功组装微生物基因组至关重要。在本文中,我们提出了一种针对给定宏基因组数据集的新颖聚类方法。宏基因组数据集具有某些显着特征,因为(i)在给定的宏基因组数据集中,不同物种中可能存在相似的序列模式,并且(ii)每个物种具有不同数量的个体。我们的方法通过使用高斯混合模型和混合物轮廓分析并利用从宏基因组数据集中提取的基因组特征来克服这些障碍。与传统的聚类方法不同,传统聚类方法是通过数据实例的全局相似性发现聚类的,而我们的方法是通过组合共享混合分析捕获的局部相似性的数据实例来构建聚类的。通过考虑共享的混合物成分,我们的方法能够创建基因组序列的簇,尽管它们在全局上彼此不同。我们将我们的方法应用于由25个真实微生物基因组模拟的4,700万个读数组成的人工基因组学数据集,并根据簇的数量,每个簇中参与物种的数量和优势物种分析了最终的簇。即使我们的方法不能解决元基因组序列聚类领域中的所有挑战,我们仍然相信淘汰方法可以为实现目标迈出一步。 (C)2014 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号