首页> 外文学位 >Facilitating High Performance Code Parallelization.
【24h】

Facilitating High Performance Code Parallelization.

机译:促进高性能代码并行化。

获取原文
获取原文并翻译 | 示例

摘要

With the surge of social media on one hand and the ease of obtaining information due to cheap sensing devices and open source APIs on the other hand, the amount of data that can be processed is as well vastly increasing. In addition, the world of computing has recently been witnessing a growing shift towards massively parallel distributed systems due to the increasing importance of transforming data into knowledge in today's data-driven world. At the core of data analysis for all sorts of applications lies pattern matching. Therefore, parallelizing pattern matching algorithms should be made efficient in order to cater to this ever-increasing abundance of data. We propose a method that automatically detects a user's single threaded function call to search for a pattern using Java's standard regular expression library, and replaces it with our own data parallel implementation using Java bytecode injection. Our approach facilitates parallel processing on different platforms consisting of shared memory systems (using multithreading and NVIDIA GPUs) and distributed systems (using MPI and Hadoop). The major contributions of our implementation consist of reducing the execution time while at the same time being transparent to the user. In addition to that, and in the same spirit of facilitating high performance code parallelization, we present a tool that automatically generates Spark Java code from minimal user-supplied inputs. Spark has emerged as the tool of choice for efficient big data analysis. However, users still have to learn the complicated Spark API in order to write even a simple application. Our tool is easy to use, interactive and offers Spark's native Java API performance. To the best of our knowledge and until the time of this writing, such a tool has not been yet implemented.
机译:一方面由于社交媒体的激增,另一方面由于廉价的传感设备和开源API使得获取信息变得容易,可处理的数据量也大大增加。此外,由于在当今数据驱动的世界中,将数据转换为知识的重要性日益提高,因此计算机世界近来正在向大规模并行分布式系统发展。模式匹配是各种应用程序数据分析的核心。因此,应该使并行化模式匹配算法高效,以适应这种不断增加的数据量。我们提出了一种方法,该方法可以使用Java的标准正则表达式库自动检测用户的单线程函数调用以搜索模式,然后使用Java字节码注入将其替换为我们自己的数据并行实现。我们的方法有助于在由共享内存系统(使用多线程和NVIDIA GPU)和分布式系统(使用MPI和Hadoop)组成的不同平台上进行并行处理。我们实施的主要贡献在于减少了执行时间,同时对用户透明。除此之外,本着促进高性能代码并行化的精神,我们介绍了一种工具,该工具可从最少的用户提供的输入中自动生成Spark Java代码。 Spark已成为高效大数据分析的首选工具。但是,用户仍然必须学习复杂的Spark API才能编写甚至是一个简单的应用程序。我们的工具易于使用,具有交互性,并提供Spark的本机Java API性能。据我们所知,直到撰写本文时,这种工具尚未实现。

著录项

  • 作者

    Abi Saad, Maria.;

  • 作者单位

    Syracuse University.;

  • 授予单位 Syracuse University.;
  • 学科 Computer engineering.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 151 p.
  • 总页数 151
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号