首页> 外文会议>2013 IEEE International Conference on Big Data >Optimizing a MapReduce module of preprocessing high-throughput DNA sequencing data
【24h】

Optimizing a MapReduce module of preprocessing high-throughput DNA sequencing data

机译:优化MapReduce模块以预处理高通量DNA测序数据

获取原文
获取原文并翻译 | 示例

摘要

The MapReduce framework has become the de facto choice for big data analysis in a variety of applications. In MapReduce programming model, computation is distributed to a cluster of computing nodes that runs in parallel. The performance of a MapReduce application is thus affected by system and middleware, characteristics of data, and design and implementation of the algorithms. In this study, we focus on performance optimization of a MapReduce application, i.e., CloudRS, which tackles on the problem of detecting and removing errors in the next-generation sequencing de novo genomic data. We present three strategies, i.e., contentexchange, content-grouping, and index-only strategies, of communication between the Map() and Reduce() functions. The three strategies differ in the way messages are exchanged between the two functions. We also present experimental results to compare performance of the three strategies.
机译:MapReduce框架已成为各种应用程序中大数据分析的事实上的选择。在MapReduce编程模型中,计算被分配到并行运行的计算节点集群。因此,MapReduce应用程序的性能受系统和中间件,数据特征以及算法的设计和实现的影响。在这项研究中,我们专注于MapReduce应用程序(即CloudRS)的性能优化,该应用程序解决了检测和消除新一代测序新基因组数据中的错误的问题。我们介绍了Map()和Reduce()函数之间通信的三种策略,即contentexchange,content-grouping和only-index策略。三种策略在两种功能之间交换消息的方式不同。我们还提出了实验结果,以比较这三种策略的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号