Optimized Unrolling of Nested Loops

机译：优化展开嵌套循环

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we address the problems of automatically selecting unroll factors for perfectly nested loops, and generating compact code for the selected unroll factors. Compared to past work, the contributions of our work include a) a more detailed cost model that includes ILP and I-cache considerations, b) a new code generation algorithm for unrolling nested loops that generates more compact code (with fewer remainder loops) than the unroll-and-jam transformation, and c) a new algorithm for efficiently enumerating feasible unroll vectors. Our experimental results confirm the wide applicability of our approach by showing a 2.2X speedup on matrix multiply, and an average l.08X speedup on seven of the SPEC95fp benchmarks (with a 1.2X speedup for two benchmarks). These speedups are significant because the baseline compiler used for comparison is the IBM XL Fortran product compiler which generates high quality code with unrolling and software pipelining of innermost loops enabled. Larger performance improvements due to unrolling of nested loops can be expected on processors that have larger numbers of registers and larger degrees of instruction-level parallelism than the processor used for our measurements (PowerPC 604).

机译：在本文中，我们解决了以下问题：自动选择完美嵌套循环的展开因子，并为选定的展开因子生成紧凑的代码。与过去的工作相比，我们的工作包括：a）包含ILP和I-cache注意事项的更详细的成本模型，b）用于展开嵌套循环的新代码生成算法，该算法生成的紧凑代码（剩余循环更少）比c）高效枚举可行的展开向量的新算法。我们的实验结果通过显示矩阵乘法的2.2倍加速和七个SPEC95fp基准的平均1.08倍加速（两个基准的1.2倍加速）证实了我们方法的广泛适用性。这些加速速度非常重要，因为用于比较的基准编译器是IBM XL Fortran产品编译器，该产品编译器生成了高质量的代码，并启用了最内部循环的展开和软件管道。与我们用于测量的处理器（PowerPC 604）相比，在具有更多寄存器和更大程度的指令级并行度的处理器上，可以预期由于嵌套循环的展开而带来的更大性能改进。

著录项

来源
《14th ACM International Conference on Supercomputing, 14th, May 8-11, 2000, Santa Fe, New Mexico》|2000年|p.153-166|共14页
会议地点 Santa Fe NM(US)
作者
Vivek Sarkar;
展开▼
作者单位

IBM Research Thomas J. Watson Research Center P.O. Box 704, Yorktown Heights, NY 10598;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Optimized Unrolling of Nested Loops [J] . Vivek Sarkar International journal of parallel programming . 2001,第5期

机译：优化展开嵌套循环
2. Swarm-inspired exploration of architecture and unrolling factors for nested-loop-based application in architectural synthesis [J] . Mishra V., Sengupta A. Electronics Letters . 2015,第2期

机译：蜂拥而至的探索架构和展开因素，以用于基于嵌套循环的架构综合应用
3. Research of Register Pressure Aware Loop Unrolling Optimizations for Compiler [J] . Xuehua Liu, Liping Ding, Yanfeng Li, MATEC Web of Conferences . 2018,第1期

机译：编译器的套准压力感知循环展开优化研究
4. PSDSE: Particle Swarm Driven Design Space Exploration of Architecture and Unrolling Factors for Nested Loops in High Level Synthesis [C] . Mishra Vipul Kumar, Sengupta Anirban International Symposium on Electronic System Design . 2014

机译：PSDSE：高级综合中嵌套循环的体系结构和展开因子的粒子群驱动设计空间探索
5. Affine loop optimization based on modulo unrolling in Chapel [D] . Sharma, Aroon 2014

机译：教堂中基于模展开的仿射循环优化
6. Scapular kinematic reconstruction – segmental optimization multibody optimization with open-loop or closed-loop chains: which one should be preferred? [O] . Benjamin Michaud, Sonia Duprey, Mickaël Begon 2017

机译：肩胛运动重建 - 分段优化用开环或闭环链进行多体优化：哪一个应该是首选？
7. Optimization to Prevent Cache Penalty by Loop Partition and Loop Unrolling [O] . Liu Li, Chen Yu, Qiao Lin, 2015

机译：优化通过循环分区和循环展开防止缓存惩罚

Optimized Unrolling of Nested Loops

摘要

著录项

相似文献

相关主题

期刊订阅