...
首页> 外文期刊>BMC Bioinformatics >MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
【24h】

MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees

机译:MrsRF:一种有效的MapReduce算法,用于分析进化树的大量集合

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating the all-to-all Robinson-Foulds distance matrix, which has many applications for visualizing and clustering large collections of evolutionary trees. We introduce MrsRF ( MapReduce Speeds up RF ), a multi-core algorithm to generate a t × t Robinson-Foulds distance matrix between t trees using the MapReduce paradigm. Results We studied the performance of our MrsRF algorithm on two large biological trees sets consisting of 20,000 trees of 150 taxa each and 33,306 trees of 567 taxa each. Our experiments show that MrsRF is a scalable approach reaching a speedup of over 18 on 32 total cores. Our results also show that achieving top speedup on a multi-core cluster requires different cluster configurations. Finally, we show how to use an RF matrix to summarize collections of phylogenetic trees visually. Conclusion Our results show that MapReduce is a promising paradigm for developing multi-core phylogenetic applications. The results also demonstrate that different multi-core configurations must be tested in order to obtain optimum performance. We conclude that RF matrices play a critical role in developing techniques to summarize large collections of trees.
机译:背景信息MapReduce是一个并行框架,已被有效地用于为大型计算集群设计大规模并行应用程序。在本文中,我们评估了MapReduce框架设计系统发育应用程序的可行性。感兴趣的问题是生成所有罗宾逊-富尔兹距离矩阵,该矩阵具有许多用于可视化和聚类大量进化树的应用程序。我们介绍了MrsRF(MapReduce Speeds up RF),这是一种多核算法,它使用MapReduce范式在t树之间生成t×t Robinson-Foulds距离矩阵。结果我们研究了MrsRF算法在两个大型生物树集上的性能,该树集由20,000个树(每个150个分类单元)和33,306个树(每个567个分类单元)组成。我们的实验表明,MrsRF是一种可扩展的方法,可以在32个内核上实现超过18倍的加速。我们的结果还表明,要在多核群集上达到最高速度,就需要不同的群集配置。最后,我们展示了如何使用RF矩阵直观地总结系统发育树的集合。结论我们的结果表明MapReduce是开发多核系统发育应用程序的有希望的范例。结果还表明,必须测试不同的多核配置以获得最佳性能。我们得出的结论是,射频矩阵在开发用于总结大量树木的技术方面起着至关重要的作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号