首页> 外文会议>International Conference on Research in Computational Molecular Biology >De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality-Value Based Algorithm

【24h】

De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality-Value Based Algorithm

机译：使用贪婪，质量值基于算法的长读转录组数据的De Novo聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Long-read sequencing of transcripts with PacBio Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (in order to scale) and makes use of quality values (in order to handle variable error rates). We test isONclust on three simulated and five biological datasets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large datasets. Our tool is available at https://github.com/ ksahlin/isONclust.

机译：通过PACBIO ISO-SEQ和牛津纳米孔技术的转录物的长读序列已被证明是许多生物体中复杂同种型景观的研究的核心。然而，来自长读数据的当前De Novo转录物重建算法有限，留下了这些技术的潜力。常见的瓶颈是根据其基因的原产地聚类长读取的可扩展和准确算法的缺乏算法。为了解决这一挑战，我们开发了IsonClust，一种群集算法，它是贪婪的（为了缩放）并使用质量值（以处理可变错误率）。我们在三个模拟和五个生物数据集上测试等级，横跨一系列的生物，技术和读取深度。我们的结果表明，对以往的方法来说，ISONClust是在对大型数据集的整体准确性和/或可扩展性方面进行的大量改进。我们的工具可在https://github.com/ ksahlin / isonclust获得。

著录项

来源
《International Conference on Research in Computational Molecular Biology 》|2019年|xiv 337 p.|共16页
会议地点
作者
Kristoffer Sahlin; Paul Medvedev;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类生物工程学（生物技术） ;
关键词

相似文献

外文文献
中文文献
专利

1. De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm [J] . Sahlin Kristoffer, Medvedev Paul Journal of computational biology: A journal of computational molecular cell biology . 2020 ,第4期

机译：使用贪婪，基于质量值算法的长读转录组数据进行Novo聚类
2. TraRECo: a greedy approach based de novo transcriptome assembler with read error correction using consensus matrix [J] . Seokhyun Yoon, Daeseung Kim, Keunsoo Kang, BMC Genomics . 2018 ,第1期

机译：TraRECo：基于贪婪方法的从头转录组汇编程序，使用共识矩阵进行读取错误校正
3. A new method for decontamination of de novo transcriptomes using a hierarchical clustering algorithm [J] . Bioinformatics . 2017 ,第9期

机译：一种使用分层聚类算法进行De Novo转录om去污的新方法
4. De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality-Value Based Algorithm [C] . Kristoffer Sahlin, Paul Medvedev International Conference on Research in Computational Molecular Biology . 2019

机译：使用贪婪的，基于质量值的算法对长期阅读的转录组数据进行从头聚类
5. Scalable model-based clustering algorithms for large databases and their applications. [D] . Jin, Huidong. 2002

机译：适用于大型数据库及其应用程序的基于模型的可伸缩群集算法。
6. TraRECo: a greedy approach based de novo transcriptome assembler with read error correction using consensus matrix [O] . Seokhyun Yoon, Daeseung Kim, Keunsoo Kang, 2018

机译：TraRECo：基于贪婪方法的从头转录组汇编程序使用共识矩阵进行读取错误校正
7. Figure 2: Twin based A, C, and E estimate comparisons between different greedy algorithms for de novo clustering at a 97 similarity threshold. [O] . -1

机译：图2：基于Two，C和E在97％相似性阈值下不同贪婪算法之间的估计比较。
8. West Virginia US Department of Energy experimental program to stimulate competitive research. Section 2: Human resource development; Section 3: Carbon-based structural materials research cluster; Section 3: Data parallel algorithms for scientific computing [R] . 1994

机译：西弗吉尼亚州美国能源部实验计划，以刺激竞争研究。第2节：人力资源开发;第3节：碳基结构材料研究集群;第3节：科学计算的数据并行算法

De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality-Value Based Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅