首页> 外文期刊>Future generation computer systems >Lark: An effective approach for software-defined networking in throughput computing clusters
【24h】

Lark: An effective approach for software-defined networking in throughput computing clusters

机译:Lark:一种用于吞吐量计算集群中软件定义网络的有效方法

获取原文
获取原文并翻译 | 示例
       

摘要

High throughput computing (HTC) systems are widely adopted in scientific discovery and engineering research. They are responsible for scheduling submitted batch jobs to utilize the cluster resources. Current systems mostly focus on managing computing resources like CPU and memory; however, they lack flexible and fine-grained management mechanisms for network resources. This has increasingly been an urgent need as current batch systems may be distributed among dozens of sites around the globe like Open Science Grid. The Lark project was motivated by this need to re-examine how the HTC layer interacts with the network layer. In this paper, we present the system architecture of Lark and its implementation as a plugin of HTCondor which is a popular HTC software project. Lark achieves lightweight network virtualization at per-job granularity for HTCondor by utilizing Linux container and virtual Ethernet devices; this provides each batch job with a unique network address in a private network namespace. We extended HTCondor's description language, ClassAds, so users can specify networking requirements in the job submission script. HTCondor can perform matchmaking to make sure user-specified network requirements and resource-specific policies are fulfilled. We also extended the job agent, condor_starter, so that it can manage and configure the job's network environment. Given this important building block as the core, we implement bandwidth management functionality at both the host and network levels utilizing software-defined networking (SDN). In addition to HTCondor, Wide area network bandwidth management for GridFTP traffic is designed and implemented. Our experiments and evaluations show that Lark can effectively manage network resources simultaneously for both applications inside the cluster environment By not resorting to heavyweight VMs, we keep startup overheads minimal compared to "regular" batch jobs. This mechanism provides the users with better predictability of their job execution and the administrators more policy flexibility in allocation of network resources.
机译:高通量计算(HTC)系统在科学研究和工程研究中被广泛采用。他们负责安排提交的批处理作业以利用群集资源。当前的系统主要集中在管理CPU和内存等计算资源上。但是,它们缺乏用于网络资源的灵活且细粒度的管理机制。由于当前的批处理系统可能分布在像Open Science Grid这样的全球数十个站点中,因此这已成为迫切的需求。 Lark项目的动机是需要重新检查HTC层与网络层的交互方式。在本文中,我们介绍了Lark的系统体系结构及其作为HTCondor插件的实现,HTCondor是一个流行的HTC软件项目。 Lark利用Linux容器和虚拟以太网设备为HTCondor实现了按作业粒度的轻量级网络虚拟化。这为每个批处理作业提供了专用网络名称空间中的唯一网络地址。我们扩展了HTCondor的描述语言ClassAd,因此用户可以在作业提交脚本中指定网络要求。 HTCondor可以进行匹配,以确保满足用户指定的网络要求和特定于资源的策略。我们还扩展了作业代理condor_starter,以便它可以管理和配置作业的网络环境。以这个重要的构建块为核心,我们利用软件定义网络(SDN)在主机和网络级别上实现带宽管理功能。除了HTCondor,还设计和实现了GridFTP通信的广域网带宽管理。我们的实验和评估表明,Lark可以为集群环境中的两个应用程序同时有效地管理网络资源。通过不使用重量级的VM,与“常规”批处理作业相比,我们将启动开销降至最低。这种机制为用户提供了更好的工作执行可预测性,并且管理员在分配网络资源方面具有更大的策略灵活性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号