首页> 外文会议>IEEE International Parallel Distributed Processing Symposium >Scaling Irregular Applications through Data Aggregation and Software Multithreading
【24h】

Scaling Irregular Applications through Data Aggregation and Software Multithreading

机译:通过数据聚合和软件多线程扩展不规则应用程序

获取原文

摘要

Emerging applications in areas such as bioinformatics, data analytics, semantic databases and knowledge discovery employ datasets from tens to hundreds of terabytes. Currently, only distributed memory clusters have enough aggregate space to enable in-memory processing of datasets of this size. However, in addition to large sizes, the data structures used by these new application classes are usually characterized by unpredictable and fine-grained accesses: i.e., they present an irregular behavior. Traditional commodity clusters, instead, exploit cache-based processor and high-bandwidth networks optimized for locality, regular computation and bulk communication. For these reasons, irregular applications are inefficient on these systems, and require custom, hand-coded optimizations to provide scaling in both performance and size. Lightweight software multithreading, which enables tolerating data access latencies by overlapping network communication with computation, and aggregation, which allows reducing overheads and increasing bandwidth utilization by coalescing fine-grained network messages, are key techniques that can speed up the performance of large scale irregular applications on commodity clusters. In this paper we describe GMT (Global Memory and Threading), a runtime system library that couples software multithreading and message aggregation together with a Partitioned Global Address Space (PGAS) data model to enable higher performance and scaling of irregular applications on multi-node systems. We present the architecture of the runtime, explaining how it is designed around these two critical techniques. We show that irregular applications written using our runtime can outperform, even by orders of magnitude, the corresponding applications written using other programming models that do not exploit these techniques.
机译:生物信息学,数据分析,语义数据库和知识发现等领域中的新兴应用程序使用了数十到数百TB的数据集。当前,只有分布式内存集群具有足够的聚合空间才能在内存中处理此大小的数据集。但是,除了较大的大小外,这些新应用程序类使用的数据结构通常还具有不可预测的细粒度访问的特征:即,它们表现出不规则的行为。取而代之的是,传统的商品集群利用基于缓存的处理器和针对本地性,常规计算和批量通信进行了优化的高带宽网络。由于这些原因,不规则的应用程序在这些系统上效率低下,并且需要自定义的手工编码优化以提供性能和大小上的扩展。轻量级软件多线程技术是可加快大型不规则应用程序性能的关键技术,该技术可通过使网络通信与计算重叠来实现数据访问等待时间,而聚合则可通过合并细粒度的网络消息来减少开销并提高带宽利用率。在商品集群上。在本文中,我们描述了GMT(全局内存和线程),这是一个运行时系统库,该库将软件多线程和消息聚合与分区全局地址空间(PGAS)数据模型结合在一起,以实现更高的性能并扩展多节点系统上的不规则应用程序。我们介绍了运行时的体系结构,并解释了如何围绕这两种关键技术进行设计。我们证明,使用运行时编写的不规则应用程序可以胜过使用其他未利用这些技术的编程模型编写的相应应用程序,甚至能提高几个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号