首页> 外文会议>International Conference on Research in Computational Molecular Biology >De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality-Value Based Algorithm
【24h】

De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality-Value Based Algorithm

机译:使用贪婪,质量值基于算法的长读转录组数据的De Novo聚类

获取原文

摘要

Long-read sequencing of transcripts with PacBio Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (in order to scale) and makes use of quality values (in order to handle variable error rates). We test isONclust on three simulated and five biological datasets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large datasets. Our tool is available at https://github.com/ ksahlin/isONclust.
机译:通过PACBIO ISO-SEQ和牛津纳米孔技术的转录物的长读序列已被证明是许多生物体中复杂同种型景观的研究的核心。然而,来自长读数据的当前De Novo转录物重建算法有限,留下了这些技术的潜力。常见的瓶颈是根据其基因的原产地聚类长读取的可扩展和准确算法的缺乏算法。为了解决这一挑战,我们开发了IsonClust,一种群集算法,它是贪婪的(为了缩放)并使用质量值(以处理可变错误率)。我们在三个模拟和五个生物数据集上测试等级,横跨一系列的生物,技术和读取深度。我们的结果表明,对以往的方法来说,ISONClust是在对大型数据集的整体准确性和/或可扩展性方面进行的大量改进。我们的工具可在https://github.com/ ksahlin / isonclust获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号