【24h】

SIMD Vectorization of Histogram Functions

机译:SIMD直方图功能的矢量化

获取原文

摘要

Existing SIMD extensions cannot efficiently vectorize the histogram function due to memory collisions. We propose two techniques to avoid this problem. In the first, a hierarchical structure of three levels is proposed. In order to provide n-way parallelism, auxiliary arrays that have n and n/2 subarrays are used in the first and second level, respectively. The last level has the primary histogram array. Indirect SIMD load and store instructions are designed in order to access different elements of different subarrays. The different subarrays in the lower levels are merged and finally at the end, the calculated results are stored in the primary histogram array. In the second method, parallel comparators are used in order to count the number of subwords within a media register that are the same. Thereafter, these numbers are added to the values of the histogram array simultaneously. Experimental results obtained by extending the SimpleScalar toolset show that proposed techniques improve the performance compared to the fastest scalar version by a factor of 7.37 and 5.52, respectively.
机译:由于内存冲突,现有的SIMD扩展无法有效地将直方图函数保持为直方图。我们提出了两种技术来避免这个问题。首先,提出了三个级别的层次结构。为了提供N-WAYPARALLATPLASIC,分别在第一和第二级别使用N和N / 2子阵列的辅助阵列。最后一个级别具有主直方图数组。设计间接SIMD负载和存储说明,以访问不同子阵列的不同元素。较低级别的不同子阵列被合并并最终结束,计算结果存储在主直方图阵列中。在第二种方法中,使用并行比较器来计算相同的媒体寄存器中的子字的数量。此后,将这些数字同时添加到直方图阵列的值。通过延长简单的实验结果而获得的实验结果表明,与最快的标量版的比较分别为7.37和5.52,提出了拟议技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号