首页> 外文会议>International conference on very large data bases >A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics
【24h】

A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics

机译:可扩展线性代数分析系统的比较评估

获取原文

摘要

The growing use of statistical and machine learning (ML) algorithms to analyze large datasets has given rise to new systems to scale such algorithms. But implementing new scalable algorithms in low-level languages is a painful process, especially for enterprise and scientific users. To mitigate this issue, a new breed of systems expose high-level bulk linear algebra (LA) primitives that are scalable. By composing such LA primitives, users can write analysis algorithms in a higher-level language, while the system handles scalability issues. But there is little work on a unified comparative evaluation of the scalability, efficiency, and effectiveness of such "scalable LA systems." We take a major step towards filling this gap. We introduce a suite of LA-specific tests based on our analysis of the data access and communication patterns of LA workloads and their use cases. Using our tests, we perform a comprehensive empirical comparison of a few popular scalable LA systems: MADlib, ML-lib, SystemML, ScaLAPACK. SciDB. and TensorFlow using both synthetic data and a large real-world dataset. Our study has revealed several scalability bottlenecks, unusual performance trends, and even bugs in some systems. Our findings have already led to improvements in SystemML, with other systems' developers also expressing interest.
机译:统计和机器学习(ML)算法越来越多地用于分析大型数据集,这催生了扩展此类算法的新系统。但是,以低级语言实现新的可伸缩算法是一个痛苦的过程,对于企业和科学用户而言尤其如此。为了缓解此问题,新一代系统公开了可扩展的高级批量线性代数(LA)原语。通过组成这样的LA原语,用户可以使用高级语言编写分析算法,而系统则可以处理可伸缩性问题。但是,对于这种“可伸缩LA系统”的可伸缩性,效率和有效性的统一比较评估,工作很少。我们朝着填补这一空白迈出了重要的一步。在分析洛杉矶工作负载及其使用案例的数据访问和通信模式的基础上,我们引入了一系列针对洛杉矶的测试。使用我们的测试,我们对几种流行的可伸缩LA系统进行了全面的经验比较:MADlib,ML-lib,SystemML,ScaLAPACK。科学数据库。和TensorFlow使用合成数据和大型现实数据集。我们的研究发现了一些可伸缩性瓶颈,异常的性能趋势,甚至是某些系统中的错误。我们的发现已经导致SystemML的改进,其他系统的开发人员也对此表示了兴趣。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号