首页> 外文会议>International Conference on Big Scientific Data Management >Using Hadoop for High Energy Physics Data Analysis
【24h】

Using Hadoop for High Energy Physics Data Analysis

机译:使用Hadoop进行高能物理数据分析

获取原文

摘要

With the development of the new generation of High Energy Physics (HEP) experiments, huge amounts of data are being generated. Efficient parallel algorithms/frameworks and High 10 throughput are key to meet the scalability and performance requirements of HEP offline data analysis. Though Hadoop has gained a lot of attention from scientific community for its scalability and parallel computing framework for large data sets, it's still difficult to make HEP data processing tasks run directly on Hadoop. In this paper we investigate the application of Hadoop to make HEP jobs run on it transparently. Particularly, we discuss a new mechanism to support HEP software to random access data in HDFS. Because HDFS is streaming data stored only supporting sequential write and append. It cannot satisfy HEP jobs to random access data. This new feature allows the Map/Reduce tasks to random read/write on the local file system on data nodes instead of using Hadoop data streaming interface. This makes HEP jobs run on Hadoop possible. We also develop diverse MapReduce model for HEP jobs such as Corsika simulation, ARGO detector simulation and Medea++ reconstruction. And we develop a toolkit for users to submit/query/remove jobs. In addition, we provide cluster monitoring and account system to benefit to the system availability. This work has been in production for HEP experiment to gain about 40,000 CPU hours per month since September, 2016.
机译:随着新一代高能物理(HEP)实验的发展,正在产生大量数据。高效并行算法/框架和高10个吞吐量是满足HEP离线数据分析的可扩展性和性能要求的关键。虽然Hadoop从科学界获得了很多关注的大型数据集的可扩展性和平行计算框架,但仍然难以使HEP数据处理任务直接在Hadoop上运行。在本文中,我们调查Hadoop的应用使HEP乔布斯透明地运行。特别是,我们讨论了一种新机制,支持HEP软件到HDF中的随机访问数据。因为HDFS是仅存储支持顺序写入和附加的流数据。它无法满足HEP作业到随机访问数据。此新功能允许Map /将任务在数据节点上的本地文件系统上随机读/写入,而不是使用Hadoop数据流界面。这使得HEP在Hadoop上运行。我们还为核心科技硕士学位,ARGO探测器仿真和MEDEA ++重建开发了HEP工作的多样化MapReduce模型。我们开发一个工具包,供用户提交/查询/删除作业。此外,我们提供集群监控和帐户系统,以利用系统可用性。这项工作已在生产HEP实验中,自2016年9月以来每月获得约40,000个CPU小时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号