首页> 外文期刊>International Journal of Computer Trends and Technology >Automated Enhanced Parallelization of Sequential C to Parallel OpenMP
【24h】

Automated Enhanced Parallelization of Sequential C to Parallel OpenMP

机译:顺序C到并行OpenMP的自动增强并行化

获取原文
获取外文期刊封面目录资料

摘要

The paper presents the work towards implementation ofa technique to enhance parallel execution of auto-generated OpenMP programs by considering the architecture of on-chip cache memory, thereby achieving higher performance. It avoids false-sharing in 'for-loops' by generating OpenMP code for dynamically scheduling chunks by placing each core's data cache line size apart.It has been found that most of the parallelization tools do not deal with significant issues associated with multicore such as false-sharing, which can degrade performance. An open-source parallelizat ion tool called Par4All (Parallel for All), which internally makes use of PIPS (Parallelization Infrastructure for Parallel Systems) -PoCC (Polyhedral Compiler Collection) integrationhas been analyzed and its power has been unleashed to achieve maximum hardware utilization. The work is focused only onoptimizing parallelization of for-loops, since loops are the most time consuming part s of code. The performance of the generat ed OpenMP programs have been analyzed on different architectures using Intel. VTune. Performance Analyzer. Some of the computationally intensive programs from PolyBench have been tested with different data sets and the results obtained reveal that the OpenMP codes generated by the enhanced technique have resulted in considerable speedup. The deliverables include automation tool, test cases, corresponding OpenMP programs and performance analysis reports.
机译:本文介绍了一种通过考虑片上缓存存储器的架构来增强自动生成的OpenMP程序的并行执行技术的工作,从而实现了更高的性能。通过将每个内核的数据高速缓存行大小分开来生成用于动态调度块的OpenMP代码,它避免了``for-loops''中的错误共享。发现大多数并行化工具不会处理与多核相关的重大问题,例如错误共享,这可能会降低性能。已经分析了一种称为Par4All(All Parallel for All)的开源并行化工具,该工具在内部使用了PIPS(并行系统的并行化基础结构)-PoCC(多面编译器集合)集成,并且释放了其功能以实现最大的硬件利用率。由于循环是代码中最耗时的部分,因此该工作仅专注于优化for循环的并行化。使用Intel在不同的体系结构上分析了生成的OpenMP程序的性能。 VTune。性能分析器。 PolyBench的一些计算密集型程序已使用不同的数据集进行了测试,获得的结果表明,由增强技术生成的OpenMP代码已大大提高了速度。可交付成果包括自动化工具,测试用例,相应的OpenMP程序和性能分析报告。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号