首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Improving Execution Concurrency of Large-Scale Matrix Multiplication on Distributed Data-Parallel Platforms
【24h】

Improving Execution Concurrency of Large-Scale Matrix Multiplication on Distributed Data-Parallel Platforms

机译:改善分布式数据并行平台上大规模矩阵乘法的执行并发性

获取原文
获取原文并翻译 | 示例

摘要

Matrix multiplication is a dominant but very time-consuming operation in many big data analytic applications. Thus its performance optimization is an important and fundamental research issue. The performance of large-scale matrix multiplication on distributed data-parallel platforms is determined by both computation and IO costs. For existing matrix multiplication execution strategies, when the execution concurrency scales up above a threshold, their execution performance deteriorates quickly because the increase of the IO cost outweighs the decrease of the computation cost. This paper presents a novel parallel execution strategy CRMM (Concurrent Replication-based Matrix Multiplication) along with a parallel algorithm, Marlin, for large-scale matrix multiplication on data-parallel platforms. The CRMM strategy exploits higher execution concurrency for sub-block matrix multiplication with the same IO cost. To further improve the performance of Marlin, we also propose a number of novel system-level optimizations, including increasing the concurrency of local data exchange by calling native library in batch, reducing the overhead of block matrix transformation, and reducing disk heavy shuffle operations by exploiting the semantics of matrix computation. We have implemented Marlin as a library along with a set of related matrix operations on Spark and also contributed Marlin to the open-source community. For large-sized matrix multiplication, Marlin outperforms existing systems including Spark MLlib, SystemML and SciDB, with about , and speedup on average, respectively. The evaluation upon a real-world DNN workload also indicates that Marlin outperforms above systems by about , and speedup, respectively.
机译:在许多大数据分析应用程序中,矩阵乘法是占主导地位但非常耗时的运算。因此,其性能优化是一个重要的基础研究课题。分布式数据并行平台上大规模矩阵乘法的性能取决于计算和IO成本。对于现有的矩阵乘法执行策略,当执行并发扩展到阈值以上时,它们的执行性能会迅速下降,因为IO成本的增加大于计算成本的减少。本文提出了一种新颖的并行执行策略CRMM(基于并发复制的矩阵乘法)以及并行算法Marlin,用于数据并行平台上的大规模矩阵乘法。 CRMM策略在相同的IO成本下利用更高的执行并行性进行子块矩阵乘法。为了进一步提高Marlin的性能,我们还提出了许多新颖的系统级优化,包括通过批量调用本机库来增加本地数据交换的并发性,减少块矩阵转换的开销,以及通过利用矩阵计算的语义。我们已经将Marlin实施为一个库,并在Spark上实现了一系列相关的矩阵运算,并且还将Marlin贡献给了开源社区。对于大型矩阵乘法,Marlin的性能分别优于Spark和MLlib,SystemML和SciDB,其平均速度分别约为,和。对现实世界中DNN工作负载的评估还表明,Marlin的性能分别优于上述系统,分别约为,和。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号