首页> 外文会议>IEEE Conference on High Performance Extreme Computing >Optimization and evaluation of image- and signal-processing kernels on the TI C6678 multi-core DSP
【24h】

Optimization and evaluation of image- and signal-processing kernels on the TI C6678 multi-core DSP

机译:TI C6678多核DSP上图像和信号处理内核的优化和评估

获取原文

摘要

Power efficiency is an important aspect in today's high-performance embedded computing (HPEC) systems. Digital signal processors (DSPs) are well known for their power efficiency and are commonly employed in embedded systems. Increasing computational demands in image- and signal-processing applications in embedded systems has led to the development of multi-core DSPs with floating-point capabilities. The TMS320C6678 is an eight-core, high-performance DSP from Texas Instruments that provides 128 GFLOPS of single-precision and 32 GFLOPS of double-precision performance under 10W of power. In this paper, we optimize and evaluate the performance of the TMS320C6678 DSP using two image-processing kernels, 2D convolution and bilinear interpolation with image rotation, and two signal-processing kernels, frequency-domain finite impulse response (FDFIR) and corner turn. Our 2D convolution results show that the performance of the TMS320C6678 is comparable to a Nvidia GeForce 295 GTX GPU and 5 times better than a quad-core Intel Xeon W3520 CPU. We achieve real-time performance for bilinear interpolation with image rotation on the TMS320C6678 for high-definition (HD) image resolution. Our performance per Watt results for FDFIR shows that the TMS320C6678 is 8.2 times better than the Nvidia Tesla C2050 GPU. For corner turn, although the raw performance of the Tesla C2050 is better than the TMS320C6678, the performance per Watt of TMS320C6678 is 1.8 times better than the Tesla C2050.
机译:功率效率是当今高性能嵌入式计算(HPEC)系统的重要方面。数字信号处理器(DSP)的功率效率众所周知,通常用于嵌入式系统中。嵌入式系统中图像和信号处理应用对计算的需求不断增长,导致了具有浮点功能的多核DSP的发展。 TMS320C6678是德州仪器(TI)的八核高性能DSP,在10W功率下提供128 GFLOPS的单精度和32 GFLOPS的双精度性能。在本文中,我们使用两个图像处理内核(二维卷积和带图像旋转的双线性插值)以及两个信号处理内核(频域有限冲激响应(FDFIR)和转弯)来优化和评估TMS320C6678 DSP的性能。我们的2D卷积结果表明,TMS320C6678的性能可与Nvidia GeForce 295 GTX GPU媲美,并且是四核Intel Xeon W3520 CPU的5倍。我们在TMS320C6678上通过图像旋转实现了双线性插值的实时性能,以实现高清(HD)图像分辨率。我们针对FDFIR的每瓦性能结果表明,TMS320C6678的性能比Nvidia Tesla C2050 GPU的性能高8.2倍。对于转弯,尽管Tesla C2050的原始性能优于TMS320C6678,但TMS320C6678的每瓦性能却是Tesla C2050的1.8倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号