Scale-Out vs Scale-Up: A Study of ARM-based SoCs on Server-Class Workloads

REZA AZIMI; TYLER FOX; WENDY GONZALEZ; SHERIEF REDA

首页> 外文期刊>ACM Transactions on Modeling and Performance Evaluation of Computing Systems >Scale-Out vs Scale-Up: A Study of ARM-based SoCs on Server-Class Workloads

【24h】

Scale-Out vs Scale-Up: A Study of ARM-based SoCs on Server-Class Workloads

机译：横向扩展与纵向扩展：基于ARM的SoC在服务器级工作负载上的研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

ARM 64-bit processing has generated enthusiasm to develop ARM-based servers that are targeted for both data centers and supercomputers. In addition to the server-class components and hardware advancements, the ARM software environment has grown substantially over the past decade. Major development ecosystems and libraries have been ported and optimized to run on ARM, making ARM suitable for server-class workloads. There are two trends in available ARM SoCs: mobile-class ARM SoCs that rely on the heterogeneous integration of a mix of CPU cores, GPGPU streaming multiprocessors (SMs), and other accelerators, and the server-class SoCs that instead rely on integrating a larger number of CPU cores with no GPGPU support and a number of IO accelerators. For scaling the number of processing cores, there are two different paradigms: mobile-class SoCs that use scale-out architecture in the form of a cluster of simpler systems connected over a network, and server-class ARM SoCs that use the scale-up solution and leverage symmetric multiprocessing to pack a large number of cores on the chip. In this article, we present ScaleSoC cluster, which is a scale-out solution based on mobile class ARM SoCs. ScaleSoC leverages fast network connectivity and GPGPU acceleration to improve performance and energy efficiency compared to previous ARM scale-out clusters. We consider a wide range of modern server-class parallel workloads to study both scaling paradigms, including latency-sensitive transactional workloads, MPI-based CPU and GPGPU-accelerated scientific applications, and emerging artificial intelligence workloads. We study the performance and energy efficiency of ScaleSoC compared to server-class ARM SoCs and discrete GPGPUs in depth. We quantify the network overhead on the performance of ScaleSoC and show that packing a large number of ARM cores on a single chip does not necessarily guarantee better performance, due to the fact that shared resources, such as last-level cache, become performance bottlenecks. We characterize the GPGPU accelerated workloads and demonstrate that for applications that can leverage the better CPU-GPGPU balance of the ScaleSoC cluster, performance and energy efficiency improve compared to discrete GPGPUs.

机译：ARM 64位处理引起了开发针对数据中心和超级计算机的基于ARM的服务器的热情。除了服务器级组件和硬件方面的进步外，ARM软件环境在过去十年中也得到了大幅发展。已移植并优化了主要的开发生态系统和库，使其可以在ARM上运行，从而使ARM适合于服务器级的工作负载。可用的ARM SoC有两种趋势：依靠CPU内核，GPGPU流多处理器（SM）和其他加速器的混合异构集成的移动级ARM SoC，以及依靠集成CPU内核的服务器级SoC。不支持GPGPU的大量CPU内核和大量IO加速器。为了扩展处理核心的数量，有两种不同的范例：使用扩展结构的移动级SoC，其形式是通过网络连接的较简单系统的集群；以及使用扩展的服务器级ARM SoC。解决方案并利用对称多处理功能将大量内核封装在芯片上。在本文中，我们介绍了ScaleSoC集群，它是基于移动类ARM SoC的横向扩展解决方案。与以前的ARM横向扩展群集相比，ScaleSoC利用快速的网络连接和GPGPU加速来提高性能和能效。我们考虑了各种现代服务器级并行工作负载，以研究两种扩展范例，包括对延迟敏感的事务工作负载，基于MPI的CPU和GPGPU加速的科学应用程序以及新兴的人工智能工作负载。与服务器级ARM SoC和离散GPGPU相比，我们研究了ScaleSoC的性能和能效。我们量化了ScaleSoC性能的网络开销，并表明在单个芯片上打包大量ARM内核并不一定保证更好的性能，这是由于共享资源（例如最后一级缓存）成为性能瓶颈这一事实。我们对GPGPU加速的工作负载进行了表征，并证明了对于可以利用ScaleSoC集群中更好的CPU-GPGPU平衡的应用程序，与分立的GPGPU相比，性能和能效都有所提高。

著录项

来源
《ACM Transactions on Modeling and Performance Evaluation of Computing Systems》 |2018年第4期|18.1-18.23|共23页
作者
REZA AZIMI; TYLER FOX; WENDY GONZALEZ; SHERIEF REDA;
展开▼
作者单位

Brown University, USA;

Brown University, USA;

Brown University, USA;

Brown University, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
ARM computing; GPGPU acceleration; scale-out clusters;

机译：ARM计算;GPGPU加速;横向扩展集群;
入库时间 2022-08-18 03:56:59

相似文献

外文文献
中文文献
专利

1. Clearing the Clouds A Study of Emerging Scale-out Workloads on Modern Hardware [J] . Michael Ferdman, Almutaz Adileh, Onur Kocberber, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2012,第4期

机译：清除云层研究现代硬件上的横向扩展工作负载
2. Optimizing Energy and Performance for Server-Class File System Workloads [J] . PRIYA SEHGAL, VASILY TARASOV, EREZ ZADOK ACM Transactions on Storage . 2010,第3期

机译：为服务器级文件系统工作负载优化能源和性能
3. Enabling Technology in Cell-Based Therapies: Scale-Up, Scale-Out, or Program In-Place [J] . C. M. Puleo, B. Davis, R. Smith SLAS Technology . 2018,第4期

机译：在基于细胞的疗法中启用技术：按比例放大，按比例缩小或就地编程
4. How Good Are Low-Power 64-Bit SoCs for Server-Class Workloads? [C] . Azimi Reza, Xin Zhan, Reda Sherief IEEE International Symposium on Workload Characterization . 2015

机译：低功耗64位SoC对服务器级工作负载的性能如何？
5. Optimizing energy and performance for server-class file system workloads. [D] . Sehgal, Priya. 2010

机译：针对服务器级文件系统工作负载优化能源和性能。
6. Scale-Up and Scale-Out of a Gender-Sensitized Weight Management and Healthy Living Program Delivered to Overweight Men via Professional Sports Clubs: The Wider Implementation of Football Fans in Training (FFIT) [O] . Kate Hunt, Sally Wyke, Christopher Bunn, 2020

机译：通过专业体育俱乐部向超重男性提供性别敏感的体重管理和健康生活计划的按比例放大和按比例缩小：培训中球迷的广泛实施（FFIT）
7. Scale-Up x Scale-Out: A Case Study Using Nutch/Lucene [O] . Maged Michael, José E. Moreira, Doron Shiloach, 2014

机译：放大x向外扩展：使用Nutch / Lucene的案例研究
8. Fmp Study of Pilot Workload. Qualification of Workload Via Instrument Scan [R] . Tolel, J. R., Vivaudou, M., Harris, R. L., 1982

机译：Fmp研究试点工作量。通过仪器扫描验证工作量

Scale-Out vs Scale-Up: A Study of ARM-based SoCs on Server-Class Workloads

摘要

著录项

相似文献

相关主题

期刊订阅