首页> 美国政府科技报告 >Enhancing Image Processing Performance for PCID in a Heterogeneous Network of Multi-core Processors
【24h】

Enhancing Image Processing Performance for PCID in a Heterogeneous Network of Multi-core Processors

机译:提高多核处理器异构网络中pCID的图像处理性能

获取原文

摘要

The Physically-Constrained Iterative Deconvolution (PCID) image deblurring code is being ported to heterogeneous networks of multi-core systems. This paper reports results from experiments using the JAWS supercomputer at MHPCC and the Cell Cluster at AFRL in Rome, NY. The results compare approaches to parallelizing FFT executions across the Xeons and the Cell's Synergistic Processing Elements (SPEs) for frame-level image processing. Optimization of FFTs in the PCID code led to a decrease in relative processing time for FFTs. Profiling PCID version 6.2, about one year ago, showed the 13 functions that accounted for the highest percentage of processing were all FFT processing functions. They accounted for over 88% of processing time in one run on Xeons. FFT optimizations led to improvement in the current PCID version 8.0. A recent profile showed that only two of the 19 functions with the highest processing time were FFT processing functions. Timing measurements showed that FFT processing for PCID version 8.0 has been reduced to less than 19% of overall processing time. We are working toward a goal of scaling to 200-400 cores per job (1-2 imagery frames/core). Running a pair of cores on each set of frames assigned to a worker reduces latency by implementing multithreading FFT processing. These results support the next higher level of parallelism in PCID, where groups of frames each producing one resolved image are sent to cliques of cores in a round robin fashion. We are fine-tuning the PCID parallelization strategy to balance processing over Xeons and Cell BEs to find an optimal partitioning of PCID over the heterogeneous processors. Using a publication/subscription oriented information management system to implement a unified communications platform makes runs on large HPCs with thousands of intercommunicating cores more flexible and more fault tolerant. Techniques for adapting the code to single precision and performance results are reported.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号