Optimizing a MapReduce module of preprocessing high-throughput DNA sequencing data

机译：优化MapReduce模块以预处理高通量DNA测序数据

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The MapReduce framework has become the de facto choice for big data analysis in a variety of applications. In MapReduce programming model, computation is distributed to a cluster of computing nodes that runs in parallel. The performance of a MapReduce application is thus affected by system and middleware, characteristics of data, and design and implementation of the algorithms. In this study, we focus on performance optimization of a MapReduce application, i.e., CloudRS, which tackles on the problem of detecting and removing errors in the next-generation sequencing de novo genomic data. We present three strategies, i.e., contentexchange, content-grouping, and index-only strategies, of communication between the Map() and Reduce() functions. The three strategies differ in the way messages are exchanged between the two functions. We also present experimental results to compare performance of the three strategies.

机译：MapReduce框架已成为各种应用程序中大数据分析的事实上的选择。在MapReduce编程模型中，计算被分配到并行运行的计算节点集群。因此，MapReduce应用程序的性能受系统和中间件，数据特征以及算法的设计和实现的影响。在这项研究中，我们专注于MapReduce应用程序（即CloudRS）的性能优化，该应用程序解决了检测和消除新一代测序新基因组数据中的错误的问题。我们介绍了Map（）和Reduce（）函数之间通信的三种策略，即contentexchange，content-grouping和only-index策略。三种策略在两种功能之间交换消息的方式不同。我们还提出了实验结果，以比较这三种策略的效果。

著录项

来源
《2013 IEEE International Conference on Big Data》|2013年|1-6|共6页
会议地点 Santa Clara CA(US)
作者
Chung Wei-Chun; Chang Yu-Jung; Chen Chien-Chih; Lee Der-Tsai;
展开▼
作者单位

Research Center for Information Technology Innovation, Academia Sinica Taipei, Taiwan, ROCc;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
error correction; genome assembly; mapreduce; next-generation sequencing; optimization;

机译：纠错;基因组装配;简化;下一代测序;优化;;

相似文献

外文文献
中文文献
专利

1. Preprocessing and Storing High-Throughput Sequencing Data [J] . ?wiercz Aleksandra12*, Bosak Bartosz3, Ch?opkowski Marek1, Computational Methods in Science and Technologygy . 2014,第1期

机译：预处理和存储高通量测序数据
2. PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets [J] . Changjin Hong, Solaiappan Manimaran, William Evan Johnson Cancer Informatics . 2014,第Supplaa1期

机译：PathoQC：高通量测序数据集的计算有效读取预处理和质量控制
3. PyMethylProcess-convenient high-throughput preprocessing workflow for DNA methylation data [J] . Levy Joshua J., Titus Alexander J., Salas Lucas A., Bioinformatics . 2019,第24期

机译：PymethylProcess-方便的高吞吐量预处理工作流程用于DNA甲基化数据
4. Optimizing a MapReduce Module of Preprocessing High-Throughput DNA Sequencing Data [C] . Wei-Chun Chung, Yu-Jung Chang, Chien-Chih Chen, IEEE International Conference on Big Data . 2013

机译：优化预处理高通量DNA测序数据的MapReduce模块
5. Preprocessing Algorithms and Software for Genomic Studies with High-Throughput Sequencing Data [D] . Zhbannikov, Ilya Y. 2015

机译：高通量测序数据的基因组研究的预处理算法和软件
6. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data [O] . Yuxin Chen, Yongsheng Chen, Chunmei Shi, -1

机译：SOAPnuke：MapReduce加速支持的软件用于集成质量控制和高通量测序数据的预处理
7. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data [O] . Yuxin Chen, Yongsheng Chen, Chunmei Shi, 2017

机译：SOAPnuke：用于集成质量控制和高吞吐量排序数据的综合质量控制的支持软件

Optimizing a MapReduce module of preprocessing high-throughput DNA sequencing data

摘要

著录项

相似文献

相关主题

期刊订阅