...
首页> 外文期刊>IEEE Letters of the Computer Society >Runtime On-Stack Parallelization of Dependence-Free For-Loops in Binary Programs
【24h】

Runtime On-Stack Parallelization of Dependence-Free For-Loops in Binary Programs

机译:二进制程序中无依赖循环的运行时堆栈并行化

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

With the multicore trend, the need for automatic parallelization is more pronounced, especially for legacy and proprietary code where no source code is available and/or the code is already running and restarting is not an option. In this paper, we engineer a mechanism for transforming at runtime a frequent for-loop with no data dependencies in a binary program into a parallel loop, using on-stack replacement. With our mechanism, there is no need for source code, debugging information or restarting the program. Also, the mechanism needs no static instrumentation or information. The mechanism is implemented using the Padrone binary modification system and pthreads, where the remaining iterations of the loop are executed in parallel. The mechanism keeps the running program state by extracting the targeted loop into a separate function and copying the current stack frame into the corresponding frames of the created threads. Initial study is conducted on a set of kernels from the Polybench workload. Experimental results show from 2× to 3.5× speedup from sequential to parallelized code on four cores, which is similar to source code level parallelization.
机译:随着多核趋势的发展,对自动并行化的需求更加明显,特别是对于没有源代码和/或代码已在运行且无法重新启动的传统代码和专有代码。在本文中,我们设计了一种机制,该机制可以使用栈上替换在运行时将二进制程序中没有数据依赖性的频繁for循环转换为并行循环。使用我们的机制,不需要源代码,调试信息或重新启动程序。而且,该机制不需要静态的仪器或信息。该机制使用Padrone二进制修改系统和pthreads实现,其中循环的其余迭代并行执行。该机制通过将目标循环提取到单独的函数中并将当前堆栈帧复制到已创建线程的相应帧中,从而保持运行程序的状态。最初的研究是从Polybench工作负载中对一组内核进行的。实验结果表明,在四个内核上从顺序代码到并行代码的速度提高了2倍至3.5倍,这类似于源代码级别的并行化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号