De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm

Sahlin Kristoffer; Medvedev Paul

首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm

【24h】

De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm

机译：使用贪婪，基于质量值算法的长读转录组数据进行Novo聚类

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Long-read sequencing of transcripts with Pacific Biosciences (PacBio) Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (to scale) and makes use of quality values (to handle variable error rates). We test isONclust on three simulated and five biological data sets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large data sets.

机译：与太平洋生物科学的成绩单（PACBIO）ISO-SEQ和牛津纳米孔技术的长度读取测序已被证明是许多生物体中复杂同种型景观的研究。然而，来自长读数据的当前De Novo转录物重建算法有限，留下了这些技术的潜力。常见的瓶颈是根据其基因家族的聚类长读取的可扩展和准确算法的缺乏算法。为了解决这一挑战，我们开发了ISonClust，一种贪婪算法，贪婪（缩放）并使用质量值（处理变量错误率）。我们在三个模拟和五个生物数据集上测试等级，横跨一系列的生物，技术和读取深度。我们的结果表明，在总体准确性和/或大型数据集的可扩展性方面，ISONClust是对先前方法的大量改进。

著录项

来源
《Journal of computational biology: A journal of computational molecular cell biology》 |2020年第4期|共13页
作者
Sahlin Kristoffer; Medvedev Paul;
展开▼
作者单位

Penn State Univ Dept Comp Sci &

Engn University Pk PA 16802 USA;

Penn State Univ Dept Comp Sci &

Engn University Pk PA 16802 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类生物数学方法;
关键词
algorithms; clustering; long-read sequencing; sequencing data analysis; third-generation sequencing; transcriptomics;

机译：算法;聚类;长读取测序;测序数据分析;第三代测序;转录组织;

相似文献

外文文献
中文文献
专利

1. De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm [J] . Sahlin Kristoffer, Medvedev Paul Journal of computational biology: A journal of computational molecular cell biology . 2020,第4期

机译：使用贪婪，基于质量值算法的长读转录组数据进行Novo聚类
2. A new method for decontamination of de novo transcriptomes using a hierarchical clustering algorithm [J] . Bioinformatics . 2017,第9期

机译：一种使用分层聚类算法进行De Novo转录om去污的新方法
3. Impact of sequencing data filtering on the quality of de novo transcriptome assembly [J] . Yakov Meger, Ekaterina Vodiasova, Anastasiya Lantushenko E3S Web of Conferences . 2021,第a期

机译：测序数据滤波对De Novo转录组装质量的影响
4. De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality-Value Based Algorithm [C] . Kristoffer Sahlin, Paul Medvedev International Conference on Research in Computational Molecular Biology . 2019

机译：使用贪婪的，基于质量值的算法对长期阅读的转录组数据进行从头聚类
5. Transcriptome De novo assembly, clustering, and annotation of novel transcripts. [D] . Pooyaei Mehr, Fatemeh Shaadi. 2013

机译：转录组从头开始组装，聚类和注释新的转录本。
6. MIDDAS-M: Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters through the Integration of Genome Sequencing and Transcriptome Data [O] . Myco Umemura, Hideaki Koike, Nozomi Nagano, -1

机译：MIDDAS-M：通过基因组测序和转录组数据的整合对次级代谢产物基因簇进行基序独立的从头检测
7. Figure 2: Twin based A, C, and E estimate comparisons between different greedy algorithms for de novo clustering at a 97 similarity threshold. [O] . -1

机译：图2：基于Two，C和E在97％相似性阈值下不同贪婪算法之间的估计比较。

De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅