Runtime On-Stack Parallelization of Dependence-Free For-Loops in Binary Programs

Marwa Yusuf; Ahmed El-Mahdy; Erven Rohou

首页> 外文期刊>IEEE Letters of the Computer Society >Runtime On-Stack Parallelization of Dependence-Free For-Loops in Binary Programs

【24h】

Runtime On-Stack Parallelization of Dependence-Free For-Loops in Binary Programs

机译：二进制程序中无依赖循环的运行时堆栈并行化

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the multicore trend, the need for automatic parallelization is more pronounced, especially for legacy and proprietary code where no source code is available and/or the code is already running and restarting is not an option. In this paper, we engineer a mechanism for transforming at runtime a frequent for-loop with no data dependencies in a binary program into a parallel loop, using on-stack replacement. With our mechanism, there is no need for source code, debugging information or restarting the program. Also, the mechanism needs no static instrumentation or information. The mechanism is implemented using the Padrone binary modification system and pthreads, where the remaining iterations of the loop are executed in parallel. The mechanism keeps the running program state by extracting the targeted loop into a separate function and copying the current stack frame into the corresponding frames of the created threads. Initial study is conducted on a set of kernels from the Polybench workload. Experimental results show from 2× to 3.5× speedup from sequential to parallelized code on four cores, which is similar to source code level parallelization.

机译：随着多核趋势的发展，对自动并行化的需求更加明显，特别是对于没有源代码和/或代码已在运行且无法重新启动的传统代码和专有代码。在本文中，我们设计了一种机制，该机制可以使用栈上替换在运行时将二进制程序中没有数据依赖性的频繁for循环转换为并行循环。使用我们的机制，不需要源代码，调试信息或重新启动程序。而且，该机制不需要静态的仪器或信息。该机制使用Padrone二进制修改系统和pthreads实现，其中循环的其余迭代并行执行。该机制通过将目标循环提取到单独的函数中并将当前堆栈帧复制到已创建线程的相应帧中，从而保持运行程序的状态。最初的研究是从Polybench工作负载中对一组内核进行的。实验结果表明，在四个内核上从顺序代码到并行代码的速度提高了2倍至3.5倍，这类似于源代码级别的并行化。

著录项

来源
《IEEE Letters of the Computer Society》 |2019年第1期|1-4|共4页
作者
Marwa Yusuf; Ahmed El-Mahdy; Erven Rohou;
展开▼
作者单位

Benha University Egypt;

Alexandria University Alexandria Egypt;

INRIA CNRS IRISA Univ Rennes Paris France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Runtime; Kernel; Benchmark testing; Multicore processing; Debugging; Binary codes; Tools;

机译：运行;核心;基准测试;多核处理;调试;二进制代码工具类;

相似文献

外文文献
中文文献
专利

1. Towards automatic binary runtime loop de-parallelization using on-stack replacement [J] . Yusuf Marwa, El-Mandy Ahmed, Rohou Erven Information Processing Letters . 2019,第MAY期

机译：使用堆栈替换实现自动二进制运行时循环去并行化
2. Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime [J] . Alcides Fonseca, Bruno Cabral, Joao Rafael, International journal of parallel programming . 2016,第6期

机译：自动并行化：在基于任务的并行运行时执行顺序程序
3. Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs [J] . Kong Martin, Pop Antoniu, Pouchet Louis-Noel, ACM Transactions on Architecture and Code Optimization . 2014,第4期

机译：平铺程序的动态数据流并行化的编译器/运行时框架
4. Parameterised Indexed FOR-Loops in Genetic Programming and Regular Binary Pattern Strings [C] . Gayan Wijesinghe, Vic Ciesielski Simulated Evolution and Learning . 2008

机译：遗传编程和规则二进制模式字符串中的参数化索引FOR循环
5. Easier Parallel Programming with Provably-Efficient Runtime Schedulers [D] . Utterback, Robert. 2017

机译：高效的运行时间调度程序可简化并行编程
6. Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime [O] . Mark Murphy, Marcus Alley, James Demmel, -1

机译：快速ℓ1-精神压缩传感并行成像MRI：可扩展并行实施和临床可行的运行时间
7. Runtime, Speculative On-Stack Parallelization of For-Loops in Binary Programs [O] . Marwa Yusuf, Ahmed El-Mahdy, Erven Rohou 2018

机译：运行时，在二进制程序中为循环的投机载于堆栈并行化
8. Parallelizing Molecular Dynamics Programs for Distributed Memory Machines: An Application of the CHAOS Runtime Support Library. [R] . Hwang, Y., Das, R., Saltz, J., 1994

机译：分布式存储器机器的分子动力学程序的并行化：CHaOs运行时支持库的应用。

Runtime On-Stack Parallelization of Dependence-Free For-Loops in Binary Programs

摘要

著录项

相似文献

相关主题

期刊订阅