首页> 外文期刊>BMC Bioinformatics >An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics
【24h】

An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

机译:Hadoop / MapReduce / HBase框架及其在生物信息学中的当前应用概述

获取原文
           

摘要

BackgroundBioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce.DescriptionAn overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date.ConclusionsHadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms.
机译:背景信息生物信息学的研究人员现在正面临着对超大规模数据集的分析,这一问题在未来几年中将以惊人的速度增长。开源软件(即Hadoop项目和相关软件)的最新发展为扩展到Linux集群上的PB级数据仓库提供了基础,并使用名为MapReduce的编程样式对此类数据提供了容错的并行化分析。给出了Hadoop的生物信息学社区,顶级Apache Software Foundation项目以及相关的开源软件项目中的当前用法。定义了Hadoop和相关的HBase项目背后的概念,并描述了当前使用Hadoop的生物信息学软件。重点是下一代测序,这是迄今为止的主要应用领域。结论Hadoop和MapReduce编程范例已经在生物信息学界,尤其是在下一代测序分析领域中建立了坚实的基础,并且这种用途正在增加。这是由于对商品Linux集群以及在云中通过将数据上传到已实施Hadoop / HBase的云供应商的基于Hadoop的分析具有成本效益;并且由于MapReduce方法在许多数据分析算法的并行化中的有效性和易用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号