首页> 外文会议> >Creating personal adaptive clusters for managing scientific jobs in a distributed computing environment
【24h】

Creating personal adaptive clusters for managing scientific jobs in a distributed computing environment

机译:创建个人自适应集群以管理分布式计算环境中的科学工作

获取原文

摘要

We describe a system for creating personal clusters in user-space to support the submission and management of thousands of compute-intensive serial jobs to the network-connected compute resources on the NSF TeraGrid. The system implements a robust infrastructure that submits and manages job proxies across a distributed computing environment. These job proxies contribute resources to personal clusters created dynamically for a user on-demand. The system adapts to the prevailing job load conditions at the distributed sites by migrating job proxies to sites expected to provide resources more quickly. The version of the system described in this paper allows users to build large personal Condor and Sun Grid Engine clusters on the TeraGrid. Users can then submit, monitor and control their scientific jobs with a single uniform interface, using the feature-rich functionality found in these job management environments. Up to 100,000 user jobs have been submitted through the system to date, enabling approximately 900 teraflops of scientific computation.
机译:我们描述了一种用于在用户空间中创建个人集群的系统,以支持将数千个计算密集型串行作业提交和管理到NSF TeraGrid上的网络连接计算资源。该系统实现了强大的基础架构,该基础架构可跨分布式计算环境提交和管理作业代理。这些作业代理将资源贡献给为用户按需动态创建的个人集群。该系统通过将作业代理迁移到期望更快提供资源的站点来适应分布式站点的主要工作负载条件。本文描述的系统版本允许用户在TeraGrid上构建大型个人Condor和Sun Grid Engine集群。然后,用户可以使用这些作业管理环境中的功能丰富的功能,通过一个统一的界面提交,监视和控制其科学作业。迄今为止,已经通过该系统提交了多达100,000个用户作业,从而实现了大约900 teraflops的科学计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号