...
首页> 外文期刊>Journal of Physics: Conference Series >A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS Experiment
【24h】

A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS Experiment

机译:推荐系统在ATLAS实验的生产和分布式分析系统PanDA中的适用性研究

获取原文

摘要

Scientific computing has advanced in the ways it deals with massive amounts of data, since the production capacities have increased significantly for the last decades. Most large science experiments require vast computing and data storage resources in order to provide results or predictions based on the data obtained. For scientific distributed computing systems with hundreds of petabytes of data and thousands of users it is important to keep track not just of how data is distributed in the system, but also of individual users' interests in the distributed data (reveal implicit interconnection between user and data objects). This however requires the collection and use of specific statistics such as correlations between data distribution, the mechanics of data distribution, and mainly user preferences. This work focuses on user activities (specifically, data usages) and interests in such a distributed computing system, namely PanDA (Production ANd Distributed Analysis system). PanDA is a high-performance workload management system originally designed to meet production and analysis requirements for a data-driven workload at the Large Hadron Collider Computing Grid for the ATLAS Experiment hosted at CERN (the European Organization for Nuclear Research). In this work we are going to investigate whether data collection that was gathered in the past in PanDA shows any trends indicating that users could have mutual interests that would be kept for the next data usages (i.e., data usage patterns), using data mining techniques such as association analysis, sequential pattern mining, and basics of the recommender system approach. We will show that such common interests between users indeed exist and thus could be used to provide recommendations (in terms of the collaborative filtering) to help users with their data selection process.
机译:科学计算在处理海量数据方面已经取得了进步,因为在过去的几十年中,生产能力已大大提高。大多数大型科学实验需要大量的计算和数据存储资源,才能根据获得的数据提供结果或预测。对于具有数百PB数据和数千用户的科学分布式计算系统,重要的是不仅要跟踪数据在系统中的分布方式,而且要跟踪单个用户对分布式数据的兴趣(用户与用户之间的隐式互连)数据对象)。但是,这需要收集和使用特定的统计信息,例如数据分布之间的相关性,数据分布的机制以及主要是用户偏好。这项工作的重点是用户活动(特别是数据使用情况)和对这种分布式计算系统(即PanDA(生产和分布式分析系统))的兴趣。 PanDA是一种高性能的工作负载管理系统,最初旨在满足大型强子对撞机计算网格的数据驱动工作负载的生产和分析要求,该网格用于CERN(欧洲核研究组织)主办的ATLAS实验。在这项工作中,我们将调查过去使用PanDA收集的数据收集是否显示任何趋势,表明使用数据挖掘技术,用户可能会为下一次数据使用(例如,数据使用模式)保持共同的利益。例如关联分析,顺序模式挖掘以及推荐系统方法的基础知识。我们将证明用户之间确实存在这种共同兴趣,因此可以用来提供建议(就协作过滤而言),以帮助用户进行数据选择过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号