首页> 外文会议>International symposium on bioinformatics research and applications >Poisson-Markov Mixture Model and Parallel Algorithm for Binning Massive and Heterogenous DNA Sequencing Reads
【24h】

Poisson-Markov Mixture Model and Parallel Algorithm for Binning Massive and Heterogenous DNA Sequencing Reads

机译:用于大规模和异质DNA测序读段的装箱的Poisson-Markov混合模型和并行算法

获取原文

摘要

A major computational challenge in analyzing metagenomics sequencing reads is to identify unknown sources of massive and heterogeneous short DNA reads. A promising approach is to efficiently and sufficiently extract and exploit sequence features, i.e., k-mers, to bin the reads according to their sources. Shorter k-mers may capture base composition information while longer k-mers may represent reads abundance information. We present a novel Poisson-Markov mixture Model (PMM) to systematically integrate the information in both long and short k-mers and develop a parallel algorithm for improving both reads binning performance and running time. We compare the performance and running time of our PMM approach with selected competing approaches using simulated data sets, and we also demonstrate the utility of our PMM approach using a time course metagenomics data set. The probabilistic modeling framework is sufficiently flexible and general to solve a wide range of supervised and unsupervised learning problems in metagenomics.
机译:分析宏基因组学测序读段的主要计算挑战是识别大量和异质短DNA读段的未知来源。一种有前途的方法是有效和充分地提取和利用序列特征,即k聚体,以根据其来源对读段进行分类。较短的k-mers可捕获碱基组成信息,而较长的k-mers可代表读取的丰度信息。我们提出了一种新颖的Poisson-Markov混合模型(PMM),以系统地将信息整合到长k-mer和短k-mers中,并开发了一种并行算法来改善读取装箱性能和运行时间。我们使用模拟数据集将我们的PMM方法的性能和运行时间与选定的竞争方法进行了比较,并且还使用时程宏基因组学数据集演示了PMM方法的实用性。概率建模框架具有足够的灵活性和通用性,可以解决宏基因组学中各种有监督和无监督的学习问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号