首页> 外文会议> >16-bit FP sub-word parallelism to facilitate compiler vectorization and improve performance of image and media processing
【24h】

16-bit FP sub-word parallelism to facilitate compiler vectorization and improve performance of image and media processing

机译:16位FP子字并行处理,可促进编译器矢量化并提高图像和媒体处理的性能

获取原文

摘要

We consider the implementation of 16-bit floating point instructions on a Pentium 4 and a PowerPC G5 for image and media processing. By measuring the execution time of benchmarks with these new simulated instructions, we show that significant speed-up is obtained compared to 32-bit FP versions. For image processing, the speed-up both comes from doubling the number of operations per SIMD instruction and the better cache behavior with byte storage. For data stream processing with arrays of structures, the speed-up mainly comes from the wider SIMD instructions.
机译:我们考虑在奔腾4和用于图像和媒体处理的PowerPC G5上实现16位浮点指令。通过使用这些新的模拟指令测量基准测试的执行时间,我们表明与32位FP版本相比,可以显着提高速度。对于图像处理,速度的提高都来自于每条SIMD指令的操作数量加倍以及字节存储的更好的缓存行为。对于具有结构数组的数据流处理,提速主要来自更宽的SIMD指令。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号