首页> 中文期刊>计算机工程 >基于MapReduce的高能物理数据分析系统

基于MapReduce的高能物理数据分析系统

     

摘要

将 MapReduce 思想引入到高能物理数据分析中,提出一个基于 Hadoop 框架的高能物理数据分析系统。通过建立事例的TAG 信息数据库,将需要进一步分析的事例数减少2~3个数量级,从而减轻 I/O 压力,提高分析作业的效率。利用基于 TAG 信息的事例预筛选模型以及事例分析的 MapReduce 模型,设计适用于 ROOT 框架的数据拆分、事例读取、结果合并等 MapReduce 类库。在北京正负电子对撞机实验上进行系统实现后,将其应用于一个8节点实验集群上进行测试,结果表明,该系统可使4×106个事例的分析时间缩短23%,当增加节点个数时,每秒钟能够并发分析的事例数与集群的节点数基本呈正比,说明事例分析集群具有良好的扩展性。%This paper brings the idea of MapReduce parallel processing to high energy physics data analysis, proposes a high energy physics data analysis system based on Hadoop framework. It significantly reduces the number of events that need to do further analysis by 2~3 classes by establishing an event TAG information database, which reduces the I/O volume and improves the efficiency of data analysis jobs. It designs proper MapReduce libs that fit for the ROOT framework to do things such as data splitting, event fetching and result merging by using event pre-selection model based on TAG information and MapReduce model of event analysis. A real system is implemented on BESIII experiment, an 8-nodes cluster is used for data analysis system test, the test result shows that the system shortens the data analyzing time by 23% of 4×106 event, and event number of concurrence analysis per second is higher than cluster nodes when adding more worker nodes, which explains that the case analysis cluster has a good scalability.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号