首页> 外文期刊>Computer architecture news >Parallelization, Performance Analysis, and Algorithm Consideration of Hough Transform on Chip Multiprocessors
【24h】

Parallelization, Performance Analysis, and Algorithm Consideration of Hough Transform on Chip Multiprocessors

机译:芯片多处理器霍夫变换的并行化,性能分析和算法考虑

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a parallelization framework for emerging applications on the future chip multiprocessors (CMPs). With the continuing prevalence of CMP and the number of on-die cores increasing steadily for the foreseeable future, one key issue in harnessing the computation power of such a CMP is how to effectively manage and execute many threads at the same time. Hence, we study a parallelization framework, which includes (1) coarse-grain and fine-grain multi-threading, (2) performance analysis, and (3) algorithms changes. In particular, this paper shows how the Hough Transform can be parallelized, as an example.Starting with a sports soccer analysis workload that heavily uses Hough Transform to detect lines in sports soccer field, we extract the coarse-grain data-level parallelism and examine its scaling performance on an 8-core symmetric multiprocessor machine. After realizing the parallel performance limiting factors, we target to exploit the fine-grain data-level parallelism and evaluate its speedup on the 8-core machine and a simulated 64-core CMP. Due to parallel overhead and demanding memory requirements, this fine-grain parallelization doesn't contribute significant performance improvement. After that, we propose a new Hough Transform, and parallelize it in a fine-grain way. Experimental data shows that the new Hough Transform exposes a significant amount of concurrency and pretty good data locality. On the simulated 64-core CMP, we achieve parallel scaling of 61.7x, enabling real-time Hough Transform.
机译:本文为未来的芯片多处理器(CMP)上的新兴应用程序提供了并行化框架。随着CMP的持续流行以及在可预见的将来片上内核的数量稳步增加,如何利用此类CMP的计算能力的一个关键问题是如何有效地同时管理和执行许多线程。因此,我们研究了一种并行化框架,其中包括(1)粗粒度和细粒度多线程,(2)性能分析和(3)算法更改。特别是,本文以Hough变换为例进行说明。从大量使用Hough变换检测运动足球场中线条的体育足球分析工作量开始,我们提取粗粒度数据级并行度并进行检查在8核对称多处理器计算机上的扩展性能。在意识到并行性能限制因素之后,我们的目标是利用细粒度的数据级并行性,并在8核计算机和模拟的64核CMP上评估其加速。由于并行开销和苛刻的内存要求,这种细粒度的并行化不会显着改善性能。之后,我们提出了一个新的霍夫变换,并以细粒度的方式对其进行并行化。实验数据表明,新的Hough变换公开了大量的并发性和相当不错的数据局部性。在模拟的64核CMP上,我们实现了61.7倍的并行缩放,从而实现了实时霍夫变换。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号