SEED: efficient clustering of next-generation sequences

机译：SEED：下一代序列的有效聚类

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

>Motivation: Similarity clustering of next-generation sequences (NGS) is an important computational problem to study the population sizes of DNA/RNA molecules and to reduce the redundancies in NGS data. Currently, most sequence clustering algorithms are limited by their speed and scalability, and thus cannot handle data with tens of millions of reads.>Results: Here, we introduce SEED—an efficient algorithm for clustering very large NGS sets. It joins sequences into clusters that can differ by up to three mismatches and three overhanging residues from their virtual center. It is based on a modified spaced seed method, called block spaced seeds. Its clustering component operates on the hash tables by first identifying virtual center sequences and then finding all their neighboring sequences that meet the similarity parameters. SEED can cluster 100 million short read sequences in <4 h with a linear time and memory performance. When using SEED as a preprocessing tool on genome/transcriptome assembly data, it was able to reduce the time and memory requirements of the Velvet/Oasis assembler for the datasets used in this study by 60–85% and 21–41%, respectively. In addition, the assemblies contained longer contigs than non-preprocessed data as indicated by 12–27% larger N50 values. Compared with other clustering tools, SEED showed the best performance in generating clusters of NGS data similar to true cluster results with a 2- to 10-fold better time performance. While most of SEED's utilities fall into the preprocessing area of NGS data, our tests also demonstrate its efficiency as stand-alone tool for discovering clusters of small RNA sequences in NGS data from unsequenced organisms.>Availability: The SEED software can be downloaded for free from this site: .>Contact: >Supplementary information: are available at Bioinformatics online

机译：>动机：下一代序列（NGS）的相似性聚类是研究DNA / RNA分子种群大小并减少NGS数据冗余的重要计算问题。当前，大多数序列聚类算法都受到其速度和可扩展性的限制，因此无法处理具有数千万次读取的数据。>结果：在这里，我们介绍SEED —一种有效的算法，用于聚类非常大的NGS集。它将序列连接到簇中，这些簇与虚拟中心的区别最多可以是三个错配和三个突出的残基。它基于一种改进的间隔种子方法，称为块间隔种子。它的聚类组件通过首先识别虚拟中心序列，然后找到所有满足相似性参数的相邻序列，对哈希表进行操作。 SEED可以在不到4小时的时间内将1亿个短读序列聚类，并且具有线性时间和存储性能。当使用SEED作为基因组/转录组装配数据的预处理工具时，它能够将Velvet / Oasis装配器在本研究中使用的数据集的时间和内存需求分别减少60-85％和21-41％。此外，程序集包含的重叠群比未预处理的数据更长，如N50值大12–27％所示。与其他聚类工具相比，SEED在生成NGS数据的聚类中表现出最佳性能，与真实聚类结果相似，时间性能提高了2到10倍。尽管SEED的大多数实用程序都属于NGS数据的预处理区域，但我们的测试也证明了其作为独立工具从未排序生物中发现NGS数据中小RNA序列簇的效率。>可用性：可以从以下站点免费下载该软件：。>联系方式： >补充信息：可从Bioinformatics在线获得

著录项

期刊名称 Bioinformatics
作者
Ergude Bao; Tao Jiang; Isgouhi Kaloshian; Thomas Girke;
展开▼
作者单位

展开▼
年(卷),期 -1(27),18
年度 -1
页码 2502–2509
总页数 8
原文格式 PDF
正文语种
中图分类应用微生物学;生化遗传学;生化药理学;
关键词

相似文献

外文文献
中文文献
专利

1. SEED: efficient clustering of next-generation sequences [J] . Bao Ergude, Jiang Tao, Kaloshian Isgouhi, Bioinformatics . 2011,第18期

机译：SEED：下一代序列的有效聚类
2. SEED: efficient clustering of next-generation sequences [J] . Thomas Girke Bioinformatics . 2011,第18期

机译：SEED：下一代序列的有效聚类
3. Metagenome assembly through clustering of next-generation sequencing data using protein sequences [J] . Sim Mikang, Kim Jaebum Journal of Microbiological Methods . 2015,第Null期

机译：通过使用蛋白质序列聚类下一代测序数据来进行元基因组装配
4. Estimating the number of species in metagenomes by clustering next-generation read sequences [C] . Ho-Sik Seok, Woonyoung Hong, Jaebum Kim International Conference on Big Data and Smart Computing . 2014

机译：通过聚类下一代读取序列来估计元基因组中的物种数量
5. Efficient Sequence Clustering and Embedding Algorithms for Large-scale Metagenomics Data [D] . Zheng, Wei. 2019

机译：大规模偏心组织数据的高效序列聚类和嵌入算法
6. Rapid and efficient human mutation detection using a bench-top next-generation DNA sequencer [O] . Qian Jiang, Tychele Turner, Maria X. Sosa, -1

机译：使用台式下一代DNA测序仪快速高效的人类突变检测
7. SEED: efficient clustering of next-generation sequences [O] . Bao, Ergude, Jiang, Tao, Kaloshian, Isgouhi, 2011

机译：SEED：下一代序列的有效聚类

SEED: efficient clustering of next-generation sequences

摘要

著录项

相似文献

相关主题

期刊订阅