首页> 外文会议>International Symposium on Computer Architecture and High Performance Computing >Dynamic Inter-Thread Vectorization Architecture: Extracting DLP from TLP
【24h】

Dynamic Inter-Thread Vectorization Architecture: Extracting DLP from TLP

机译:动态互联网化架构:从TLP中提取DLP

获取原文

摘要

Threads of Single-Program Multiple-Data (SPMD) applications often execute the same instructions on different data. We propose the Dynamic Inter-Thread Vectorization Architecture (DITVA) to leverage this implicit data-level parallelism in SPMD applications by assembling dynamic vector instructions at runtime. DITVA extends an SIMD-enabled in-order SMT processor with an inter-thread vectorization execution mode. In this mode, multiple scalar threads running in lockstep share a single instruction stream and their respective instruction instances are aggregated into SIMD instructions. To balance thread-and data-level parallelism, threads are statically grouped into fixed-size independently scheduled warps. DITVA leverages existing SIMD units and maintains binary compatibility with existing CPU architectures. Our evaluation on the SPMD applications from the PARSEC and Rodinia OpenMP benchmarks shows that a 4-warp × 4-lane 4-issue DITVA architecture with a realistic bank-interleaved cache achieves 1.55× higher performance than a 4-thread 4-issue SMT architecture with AVX instructions while fetching and issuing 51% fewer instructions, achieving an overall 24% energy reduction.
机译:单程多数据(SPMD)应用程序的线程通常在不同数据上执行相同的指令。我们通过在运行时组装动态矢量指令组装动态矢量指令,提出动态线程间矢量化架构(DITVA),以利用SPMD应用中的这种隐式数据级并行性。 DITVA以跨线/际互连SMT处理器扩展了一个激活的Inverly Vectorization执行模式。在此模式下,在LockStep中运行的多个标量线程共享单个指令流,并将其各自的指令实例聚合到SIMD指令中。要平衡线程和数据级并行性,线程将静态分组为固定大小的独立计划的扭曲。 DITVA利用现有的SIMD单元并与现有CPU架构保持二进制兼容性。我们对PARSEC和Rodinia OpenMP基准测试的SPMD应用程序的评估表明,具有现实银行交错缓存的4幅横向×4车道4-发行的DITVA架构实现了比4线索4号级SMT架构更高的性能1.55倍使用AVX指令在取出和发出51%的指令时,达到总体的能量减少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号