Cascaded execution: Speeding up unparallelized execution onshared-memory multiprocessors

机译：级联执行：加速无并行执行共享内存多处理器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Both inherently sequential code and limitations of analysistechniques prevent full parallelization of many applications byparallelizing compilers. Amdahl's Law tells us that as parallelizationbecomes increasingly effective, any unparallelized loop becomes anincreasingly dominant performance bottleneck. We present a technique forspeeding up the execution of unparallelized loops by cascading theirsequential execution across multiple processors: only a single processorexecutes the loop body at any one time, and each processor executes onlya portion of the loop body before passing control to another. Cascadedexecution allows otherwise idle processors to optimize their memorystate for the eventual execution of their next portion of the loop,resulting in significantly reduced overall loop body execution times. Weevaluate cascaded execution using loop nests from wave5, a Spec95fpbenchmark application, and a synthetic benchmark. Running on a PC with 4Pentium Pro processors and an SGI Power Onyx with 8 R10000 processors,we observe an overall speedup of 1.35 and 1.7, respectively, for thewave5 loops we examined and speedups as high as 4.5 for individualloops. Our extrapolated results using the synthetic benchmark show apotential for speedups as large as 16 on future machines

机译：固有的顺序代码和分析限制技术防止许多应用程序完全并行化并行化编制者。 Amdahl的法律告诉我们，作为并行化变得越来越有效，任何无与伦比的循环都变成了一个越来越多的性能瓶颈。我们提出了一种技术通过级联他们加快执行无与伦比的环多个处理器的顺序执行：仅单个处理器在任何一次执行循环主体，每个处理器仅执行将控制器传递到另一个前的环主体的一部分。级联执行允许其他闲置处理器优化他们的内存最终执行他们的下一部分循环的状态，导致总体循环体执行时间显着减少。我们使用来自Wave5的循环嵌套来评估级联执行，SPEC95FP 基准应用和合成基准。用4个在电脑上运行 Pentium Pro处理器和SGI Power Onyx，8 0000处理器，我们分别遵守1.35和1.7的整体加速 Wave5循环我们检查和加速高达4.5的个人循环。我们使用合成基准显示的推断结果显示在未来机器上大约16的加速潜力

著录项

来源
《The 7th International Power Engineering Conference, 2005. IPEC 2005》|2005年|p.714-719|共6页
会议地点
作者
Anderson R.E.; Nguyen T.D.; Zahorjan J.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Speeding up test execution with increased cache locality [J] . Stratis Panagiotis, Rajan Ajitha Software Testing, Verification and Reliability . 2018,第5期

机译：通过增加缓存位置来加快测试执行速度
2. Architecture for speeding up program execution with cloud technology [J] . Huang Tzu-Chi, Shieh Ce-Kuen, Chilamkurti Naveen, Journal of supercomputing . 2016,第9期

机译：利用云技术加速程序执行的架构
3. Speeding up Spatial Database Query Execution using GPUs [J] . Bogdan Simion, Suprio Ray, Angela Demke Brown Procedia Computer Science . 2012,第1期

机译：使用GPU加速空间数据库查询执行
4. Cascaded execution: Speeding up unparallelized execution on shared-memory multiprocessors [C] . Anderson, R.E., Nguyen, . 1999

机译：级联执行：加速共享内存多处理器上的无并行执行
5. Semantically ordered parallel execution of multiprocessor programs [D] . Gupta, Gagan 2015

机译：多处理器程序的语义顺序并行执行
6. A Switch from a Gradient to a Threshold Mode in the Regulation of a Transcriptional Cascade Promotes Robust Execution of Meiosis in Budding Yeast [O] . Vyacheslav Gurevich, Yona Kassir 2008

机译：渐变级联的从梯度模式到阈值模式的转换促进了芽中酵母减数分裂的稳健执行。
7. Cascaded Execution: Speeding Up Unparallelized Execution on Shared-Memory Multiprocessors [O] . Ruth Anderson Thu, Ruth E. Anderson, Thu D. Nguyen, 1998

机译：级联执行：加速共享内存多处理器上的非并行执行

Cascaded execution: Speeding up unparallelized execution onshared-memory multiprocessors

摘要

著录项

相似文献

相关主题

期刊订阅