首页> 外文会议>Conference on Computing in High Energy and Nuclear Physics >Evaluation of Apache Hadoop for parallel data analysis with ROOT
【24h】

Evaluation of Apache Hadoop for parallel data analysis with ROOT

机译:评估Apache Hadoop与root并行数据分析

获取原文
获取外文期刊封面目录资料

摘要

The Apache Hadoop software is a Java based framework for distributed processing of large data sets across clusters of computers, using the Hadoop file system (HDFS) for data storage and backup and MapReduce as a processing platform.Hadoop is primarily designed for processing large textual data sets which can be processed in arbitrary chunks, and must be adapted to the use case of processing binary data files which cannot be split automatically.However, Hadoop offers attractive features in terms of fault tolerance, task supervision and control, multi-user functionality and job management.For this reason, we evaluated Apache Hadoop as an alternative approach to PROOF for ROOT data analysis.Two alternatives in distributing analysis data were discussed:either the data was stored in HDFS and processed with MapReduce, or the data was accessed via a standard Grid storage system (dCache Tier-2) and MapReduce was used only as execution back-end.The focus in the measurements were on the one hand to safely store analysis data on HDFS with reasonable data rates and on the other hand to process data fast and reliably with MapReduce.In the evaluation of the HDFS, read/write data rates from local Hadoop cluster have been measured and compared to standard data rates from the local NFS installation.In the evaluation of MapReduce, realistic ROOT analyses have been used and event rates have been compared to PROOF.
机译:Apache Hadoop软件是一种基于Java的基于Java,用于跨计算机集群的大数据集的分布式处理,使用Hadoop文件系统(HDFS)进行数据存储和备份和MapReduce作为处理平台.Hadoop主要用于处理大型文本数据可以在任意块中处理的集合,并且必须适用于处理不能自动分割的二进制数据文件的用例。无论如何,Hadoop就容错,任务监控和控制,多用户功能和多用户功能提供了吸引力的功能工作管理。这是因为这个原因,我们评估了Apache Hadoop作为root数据分析证明的替代方法。讨论了Wo替代方法,讨论了分发分析数据的替代方法:数据存储在HDF中并通过MapReduce进行处理,或者通过A处理数据标准电网存储系统(DCACHE TIER-2)和MAPREDUCE仅用后端使用。测量中的焦点是一方面为了安全地在具有合理数据速率的HDF上存储分析数据,另一方面,使用MapReduce快速且可靠地处理数据。在HDF的评估中,已测量本地Hadoop集群的读/写数据速率并与标准数据速率进行比较从本地NFS安装。在MapReduce的评估中,已经使用了现实的根部分析,并将事件率与证明进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号