An Efficient Programming Skeleton for Clusters of Multi-Core Processors

Mina Hosseini Rad; Ahmad Patooghy; Mahdi Fazeli

首页> 外文期刊>International journal of parallel programming >An Efficient Programming Skeleton for Clusters of Multi-Core Processors

【24h】

An Efficient Programming Skeleton for Clusters of Multi-Core Processors

机译：多核处理器集群的高效编程框架

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a divide and conquer skeleton which aids parallel system programmers by (1) reducing programming complexity, (2) shortening programming time, and (3) enhancing code efficiency. To do this, the proposed skeleton exploits three mechanisms of (1) work-stealing, and (2) communication/computation overlapping, and (3) architectural awareness in the proposed divide and conquer skeleton. Using the work-stealing mechanism, when a processing element reaches a low-load condition, the processing core fetches a new job from the waiting queue of other cores. The second mechanism uses special threads to enable the proposed skeleton to overlapping computations with communications. The third mechanism considers the architectural parameters of the host system e.g., size of L1 cache, network bandwidth, network latency to maximally match a divide and conquer problem with the proposed skeleton. To evaluate the proposed skeleton, three benchmarks of merge sort, fast Fourier transform, and standard matrix multiplication are developed by the proposed skeleton as well as customized programming. Experiments are done in both simulation and real implementation environments. The set of six codes are simulated using COTSon simulator and also implemented on 28 dual-core real system. Obtained results from simulations showed an average of 12.6% speed-up of the proposed skeleton as compared to the customized programming; obtained speed-up in real environment is 9.6%. Furthermore, programming aided by the proposed skeleton, is at least 70% faster than custom programming while this difference increases as the program volume increases.

机译：本文提出了一种分而治之的框架，该框架通过（1）降低编程复杂性，（2）缩短编程时间以及（3）增强代码效率来帮助并行系统程序员。为此，拟议的框架利用了三种机制：（1）窃取工作，以及（2）通信/计算重叠，以及（3）拟议的分治框架中的架构意识。使用工作窃取机制，当处理元素达到低负载条件时，处理核心将从其他核心的等待队列中获取新作业。第二种机制使用特殊线程来使建议的框架能够将计算与通信重叠。第三种机制考虑了主机系统的体系结构参数，例如L1缓存的大小，网络带宽，网络等待时间，以最大程度地将分治问题与所提出的框架相匹配。为了评估拟议的框架，通过拟议的框架以及定制的程序开发了三种合并排序，快速傅立叶变换和标准矩阵乘法的基准。实验是在仿真环境和实际实现环境中进行的。使用COTSon模拟器对这六个代码集进行了仿真，并且还可以在28个双核真实系统上实现。从仿真中获得的结果表明，与定制编程相比，拟议骨架的平均速度提高了12.6％；在实际环境中获得的加速为9.6％。此外，借助建议的框架进行的编程比自定义编程至少快70％，而这种差异随着程序量的增加而增加。

著录项

来源
《International journal of parallel programming》 |2018年第6期|1094-1109|共16页
作者
Mina Hosseini Rad; Ahmad Patooghy; Mahdi Fazeli;
展开▼
作者单位

School of Computer Science, Institute for Research in Fundamental Sciences (IPM);

School of Computer Science, Institute for Research in Fundamental Sciences (IPM),Department of Computer Engineering, Iran University of Science and Technology;

Department of Computer Engineering, Iran University of Science and Technology;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Cluster computing; Divide and conquer; Multi-core processor; Parallel programming; Skeleton;

机译：集群计算;分而治之;多核处理器;并行编程;骨架;

相似文献

外文文献
中文文献
专利

1. Price-Performance Aspects of Accelerating the FDTD Method Using the Vector Processing Programming Paradigm on GPU and Multi-Core Clusters [J] . Ilgner Robert G., Davidson David B. Applied Computational Electromagnetics Society journal . 2014,第5期

机译：在GPU和多核集群上使用矢量处理编程范例加速FDTD方法的价格性能方面
2. Asynchronous migration for parallel genetic programming on a computer cluster with multi-core processors [J] . Shingo Kurose, Kunihito Yamamori, Masaru Aikawa, Artificial life and robotics . 2012,第4期

机译：具有多核处理器的计算机集群上的并行遗传编程的异步迁移
3. Efficient Implementation of OFDM Inner Receiver on a Programmable Multi-Core Processor Platform [J] . Wenhua FAN, Chen CHEN, Yun CHEN, IEICE Transactions on Communications . 2012,第4期

机译：在可编程多核处理器平台上高效实现OFDM内部接收器
4. Efficient Execution of SkePU Skeleton Programs on the Low-Power Multicore Processor Myriad2 [C] . Sebastian Thorarensen, Rosandra Cuello, Christoph Kessler, Euromicro International Conference on Parallel, Distributed, and Network-Based Processing . 2016

机译：低功耗多核处理器Myriad2上SkePU骨架程序的高效执行
5. Efficient Defense Against Covert and Side Channel Attack on Multi-Core Processor Using Signal Processing Techniques [D] . Fang, Hongyu. 2021

机译：使用信号处理技术对多核处理器的封面和侧频攻击有效防御
6. Energy Efficient Image/Video Data Transmission on Commercial Multi-Core Processors [O] . Sungju Lee, Heegon Kim, Yongwha Chung, 2012

机译：商用多核处理器上的节能图像/视频数据传输
7. Efficient program scheduling for heterogeneous multi-core processors [O] . Jian Chen, Lizy K. John 2013

机译：异构多核处理器的高效程序调度

An Efficient Programming Skeleton for Clusters of Multi-Core Processors

摘要

著录项

相似文献

相关主题

期刊订阅