首页> 外文期刊>Concurrency, practice and experience >Performance enhancement for iterative data computing with in-memory concurrent processing
【24h】

Performance enhancement for iterative data computing with in-memory concurrent processing

机译:内存并发处理可提高迭代数据计算的性能

获取原文
获取原文并翻译 | 示例

摘要

The big data era has resulted in the development of several data analysis tools. Spark is a type of in-memory processing fitted iteration and interactive data mining tool. This tool possesses higher data-processing performance than MapReduce, which is an offline storage mechanism. However, some disadvantages of in-memory processing, such as massive in-memory data requirements, cause cross-node data transfer that result in a long computation time. The performance of the process can be improved if the in-memory process is executed with fewer shuffle instructions. Therefore, this study aims to enhance the performance of iterative application through instruction replacement. Three empirical research cases with diverse datasets and iterations are used to modify the program. We adopt a strategy of downloading a small resilient distributed dataset and replacing the shuffle-included instructions to shorten the processing time with an automated code replacement by using exhaustively code matching. The experimental results reveal an improvement of up to 39% in the execution time compared with the existing in-memory processing programs with various dataset sizes.
机译:大数据时代已导致开发了多种数据分析工具。 Spark是一种适合于内存处理的迭代和交互式数据挖掘工具。该工具比离线存储机制MapReduce具有更高的数据处理性能。但是,内存中处理的一些缺点(例如,大量的内存中数据需求)会导致跨节点数据传输,从而导致计算时间较长。如果使用较少的随机播放指令执行内存中进程,则可以提高进程的性能。因此,本研究旨在通过指令替换来增强迭代应用程序的性能。使用具有不同数据集和迭代的三个经验研究案例来修改程序。我们采用的策略是下载一个小的弹性分布式数据集,并替换掉包含随机播放的指令,以通过使用穷举代码匹配自动替换代码来缩短处理时间。实验结果表明,与现有的具有各种数据集大小的内存处理程序相比,执行时间最多可提高39%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号