首页> 外文会议>International conference on algorithms and architectures for parallel processing >PPCAS: Implementation of a Probabilistic Pairwise Model for Consistency-Based Multiple Alignment in Apache Spark
【24h】

PPCAS: Implementation of a Probabilistic Pairwise Model for Consistency-Based Multiple Alignment in Apache Spark

机译:PPCAS:Apache Spark中基于一致性的多重对齐的概率成对模型的实现

获取原文

摘要

Large-scale data processing techniques, currently known as Big-Data, are used to manage the huge amount of data that are generated by sequencers. Although these techniques have significant advantages, few biological applications have adopted them. In the Bioinformatic scientific area, Multiple Sequence Alignment (MSA) tools are widely applied for evolution and phylogenetic analysis, homology and domain structure prediction. Highly-rated MSA tools, such as MAFFT, ProbCons and T-Coffee (TC), use the probabilistic consistency as a prior step to the progressive alignment stage in order to improve the final accuracy. In this paper, a novel approach named PPCAS (Probabilistic Pairwise model for Consistency-based multiple alignment in Apache Spark) is presented. PPCAS is based on the MapReduce processing paradigm in order to enable large datasets to be processed with the aim of improving the performance and scalability of the original algorithm.
机译:大规模数据处理技术(当前称为大数据)用于管理定序器生成的大量数据。尽管这些技术具有明显的优势,但很少有生物学应用采用它们。在生物信息学领域,多重序列比对(MSA)工具已广泛用于进化和系统发育分析,同源性和域结构预测。诸如MAFFT,ProbCons和T-Coffee(TC)等高质量的MSA工具使用概率一致性作为渐进对齐阶段的先前步骤,以提高最终精度。在本文中,提出了一种名为PPCAS(Apache Spark中基于一致性的多对齐的概率对模型)的新颖方法。 PPCAS基于MapReduce处理范例,以便能够处理大型数据集,目的是提高原始算法的性能和可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号