首页> 外文期刊>Future generation computer systems >A runtime estimation framework for ALICE
【24h】

A runtime estimation framework for ALICE

机译:ALICE的运行时估计框架

获取原文
获取原文并翻译 | 示例
       

摘要

The European Organization for Nuclear Research (CERN) is the largest research organization for particle physics. ALICE, short for A Large 1 on Collider Experiment, serves as one of the main detectors at CERN and produces approximately 15 petabytes of data each year. The computing associated with an ALICE experiment consists of both online and offline processing. An online cluster retrieves data while an offline cluster farm performs a broad range of data analysis. Online processing occurs as collision events are streamed from the detector to the online cluster. This process compresses and calibrates the data before storing it in a data storage system for subsequent offline processing, e.g., event reconstruction. Due to the large volume of stored data to process, offline processing seeks to minimize execution time and data-staging time of the applications via a two-tier offline cluster - the Event Processing Node (EPN) as the first tier and the World LHC Grid Computing (WLGC) as the second tier. This two-tier cluster requires a smart job scheduler to efficiently manage the running of the application. Thus, we propose a runtime estimation method for this offline processing in the ALICE environment. Our approach exploits application profiles to predict the runtime of a high-performance computing (HPC) application without the need for any additional metadata. To evaluate our proposed framework, we performed our experiment on the actual ALICE applications. In addition, we also test the efficacy of our runtime estimation method to predict the run times of the HPC applications on the Amazon EC2 cloud. The results show that our approach generally delivers accurate predictions, i.e., low error percentages.
机译:欧洲核研究组织(CERN)是最大的粒子物理学研究组织。 ALICE是“对撞机实验中的A Large 1”的缩写,是CERN的主要探测器之一,每年可产生约15 PB的数据。与ALICE实验相关的计算包括在线和离线处理。联机群集检索数据,而脱机群集场执行广泛的数据分析。碰撞事件从检测器流向联机群集时,会进行联机处理。该过程在将数据存储在数据存储系统中之前进行压缩和校准,以用于随后的离线处理,例如事件重建。由于要处理的存储数据量很大,脱机处理通过两层脱机集群(第一层的事件处理节点(EPN)和世界LHC网格)寻求使应用程序的执行时间和数据暂存时间最小化计算(WLGC)作为第二层。这个两层集群需要一个智能作业调度程序来有效地管理应用程序的运行。因此,我们提出了一种针对ALICE环境中的离线处理的运行时估计方法。我们的方法利用应用程序配置文件来预测高性能计算(HPC)应用程序的运行时间,而无需任何其他元数据。为了评估我们提出的框架,我们在实际的ALICE应用上进行了实验。此外,我们还测试了运行时估计方法的有效性,以预测Amazon EC2云上HPC应用程序的运行时间。结果表明,我们的方法通常可以提供准确的预测,即错误率低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号