首页> 外文会议>IEEE International Conference on e-Science >Distributed and on-demand cache for CMS experiment at LHC
【24h】

Distributed and on-demand cache for CMS experiment at LHC

机译:在LHC的CMS实验分布和按需缓存

获取原文

摘要

In the CMS [1] computing model the experiment owns dedicated resources around the world that, for the most part, are located in computing centers with a well defined Tier hierarchy. The geo-distributed storage is then controlled centrally by the CMS Computing Operations. In this architecture data are distributed and replicated across the centers following a pre-placement model, mostly human controlled. Analysis jobs are then mostly executed on computing resources close to the data location. This of course allow to avoid CPU wasting due to I/O latency, although it does not allow to optimize the available job slots. The continuous increase of the storage requirements (a factor 20 is foreseen for HL-LHC era), as well as the proliferation of models with not-owned and on-demand resources (on private or public cloud and HPC facilities) are pushing in favor of loosening up of the CMS computing model, looking for solutions that optimize both the amount of space centrally managed by computing sites and the CPU efficiency for jobs that run on storage-less resources. As matter of fact the Tier 2 storages, for the most part, could be operated as an unmanaged cache space, thus reducing by a large fraction the operational cost with an increase in flexibility. In this scenario, the cache space will look like a distributed and shared file system populated with the most requested data; in case of missing information data access will fallback to the remote access. We think that such caches should be clustered by geographical region, in order to leverage the national high bandwidth network from NRENs and minimize the amount of replicated data between sites. A geographically distributed caching layer will be functional also to a data-lake based model where many satellite computing centers might appear and disappear dynamically. It seems quite reasonable that a protection layer against a central managed storage might be a key factor along with the control on data access latency. In this
机译:在CMS [1]计算模型中,实验在世界各地拥有专用资源,即大多数情况下都位于具有明确定义的层级的计算中心。然后通过CMS计算操作中央控制地理分布式存储。在此架构中,数据在预放置模型之后分布和复制,主要是人为控制。然后,分析作业大多是在靠近数据位置的计算资源上执行。这当然允许避免由于I / O延迟而浪费CPU浪费,尽管它不允许优化可用的作业插槽。存储要求的连续增加(用于HL-LHC时代的因子20),以及具有未拥有和不需要资源的模型的增殖(私人或公共云和HPC设施)正在推动松开CMS计算模型,寻找优化通过计算站点集中管理的空间量的解决方案以及在更少存储资源上运行的作业的CPU效率。事实上,对于大多数情况而言,可以作为非托管高速缓存空间操作,从而通过增加灵活性的运营成本来减少。在这种情况下,缓存空间将看起来像具有最令人请求的数据的分布式和共享文件系统;如果缺少信息,数据访问将退回远程访问。我们认为此类高速缓存应由地理区域聚集,以利用NRENS利用国家高带宽网络,并最大限度地减少站点之间复制的数据量。地理分布式缓存层也可以运行到基于数据湖的模型,其中许多卫星计算中心可能会出现并动态消失。它似乎非常合理的是,对中央托管存储的保护层可能是一个关键因素以及数据访问延迟的控制。在这方面

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号