ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time

Yijun Sun; Yunpeng Cai

首页> 外文期刊>Nucleic acids research >ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time

【24h】

ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time

机译：ESPRIT-Tree：准线性计算时间内数百万个16S rRNA焦磷酸序列的层次聚类分析

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Taxonomy-independent analysis plays an essential role in microbial community analysis. Hierarchical clustering is one of the most widely employed approaches to finding operational taxonomic units, the basis for many downstream analyses. Most existing algorithms have quadratic space and computational complexities, and thus can be used only for small or medium-scale problems. We propose a new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work. The basic idea is to partition a sequence space into a set of subspaces using a partition tree constructed using a pseudometric, then recursively refine a clustering structure in these subspaces. The technique relies on new methods for fast closest-pair searching and efficient dynamic insertion and deletion of tree nodes. To avoid exhaustive computation of pairwise distances between clusters, we represent each cluster of sequences as a probabilistic sequence, and define a set of operations to align these probabilistic sequences and compute genetic distances between them. We present analyses of space and computational complexity, and demonstrate the effectiveness of our new algorithm using a human gut microbiota data set with over one million sequences. The new algorithm exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.

机译：非分类学分析在微生物群落分析中起着至关重要的作用。层次聚类是找到操作分类单位的最广泛使用的方法之一，这是许多下游分析的基础。大多数现有算法具有二次空间和计算复杂性，因此只能用于中小型问题。我们提出了一种新的基于在线学习的算法，该算法可同时解决先前工作的空间和计算问题。基本思想是使用使用伪度量构造的分区树将序列空间划分为一组子空间，然后在这些子空间中递归地优化聚类结构。该技术依靠新的方法进行快速的最接近对搜索以及有效的动态插入和删除树节点。为避免详尽计算簇之间的成对距离，我们将序列的每个簇表示为一个概率序列，并定义一组操作以对齐这些概率序列并计算它们之间的遗传距离。我们目前对空间和计算复杂性进行分析，并使用具有超过一百万个序列的人类肠道菌群数据集证明了我们新算法的有效性。新算法展现了与贪婪启发式聚类算法相当的准线性时间和空间复杂度，同时实现了与标准分层聚类算法相似的准确性。

著录项

来源
《Nucleic acids research》 |2011年第14期|共1页
作者
Yijun Sun; Yunpeng Cai;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类 AB;
关键词
入库时间 2022-08-18 19:06:27

相似文献

外文文献
中文文献
专利

1. Assessment of the human faecal microbiota: II. Reproducibility and associations of 16S rRNA pyrosequences [J] . FloresR., ShiJ., GailM.H., European journal of clinical investigation . 2012,第8期

机译：人类粪便微生物群的评估：II。 16S rRNA焦磷酸序列的重现性和关联
2. Pyrosequence Read Length of 16S rRNA Gene Affects Phylogenetic Assignment of Plant-associated Bacteria [J] . Okubo Takashi, Ikeda Seishi, Yamashita Akifumi, Microbes and Environments . 2012,第2期

机译：16S rRNA基因的焦磷酸序列读取长度影响植物相关细菌的系统发育分配。
3. Pyrosequence Read Length of 16S rRNA Gene Affects Phylogenetic Assignment of Plant-associated Bacteria [J] . Takashi Okubo, Seishi Ikeda, Akifumi Yamashita, Microbes and Environments . 2012,第2期

机译：16S rRNA基因的焦磷酸序列读取长度影响植物相关细菌的系统发育分配。
4. Pyrosequence-based 16S rRNA investigation of bacterial community in dry fermentation of rice stalk [C] . Zhao Guang, Wei Li, Ma Fang, 2011 International Conference on Materials for Renewable Energy Environment . 2011

机译：基于焦磷酸序列的稻秆干发酵细菌群落的16S rRNA研究
5. Qualitative assessments and computational techniques for the studies of microbial diversity based on terminal restriction fragment length polymorphism (T-RFLP) of 16S and 18S rRNA gene sequences [D] . Shyu, Conrad. 2006

机译：基于16S和18S rRNA基因序列的末端限制性片段长度多态性（T-RFLP）的微生物多样性研究的定性评估和计算技术
6. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time [O] . Yunpeng Cai, Yijun Sun 2011

机译：ESPRIT-Tree：准线性计算时间内数百万个16S rRNA焦磷酸序列的层次聚类分析
7. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time [O] . Cai, Yunpeng, Sun, Yijun 2011

机译：ESPRIT-Tree：准线性计算时间内数百万个16S rRNA焦磷酸序列的层次聚类分析

ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time

摘要

著录项

相似文献

相关主题

期刊订阅