首页> 外文OA文献 >Job Management Requirements for NAS Parallel Systems and Clusters
【2h】

Job Management Requirements for NAS Parallel Systems and Clusters

机译:NAS并行系统和群集的作业管理要求

摘要

A job management system is a critical component of a production supercomputing environment, permitting oversubscribed resources to be shared fairly and efficiently. Job management systems that were originally designed for traditional vector supercomputers are not appropriate for the distributed-memory parallel supercomputers that are becoming increasingly important in the high performance computing industry. Newer job management systems offer new functionality but do not solve fundamental problems. We address some of the main issues in resource allocation and job scheduling we have encountered on two parallel computers - a 160-node IBM SP2 and a cluster of 20 high performance workstations located at the Numerical Aerodynamic Simulation facility. We describe the requirements for resource allocation and job management that are necessary to provide a production supercomputing environment on these machines, prioritizing according to difficulty and importance, and advocating a return to fundamental issues.
机译:作业管理系统是生产超级计算环境的关键组成部分,它允许公平有效地共享超额订购的资源。最初为传统矢量超级计算机设计的作业管理系统不适用于在高性能计算行业中变得越来越重要的分布式内存并行超级计算机。较新的作业管理系统提供了新功能,但不能解决基本问题。我们解决了在两台并行计算机上遇到的资源分配和作业调度中的一些主要问题-一个160节点的IBM SP2和一个位于数值气动仿真设施的20个高性能工作站的集群。我们描述了对资源分配和作业管理的要求,这些要求是在这些机器上提供生产超级计算环境所必需的,并根据难度和重要性进行优先级排序,并提倡回归基本问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号