【24h】

Optimizing the Efficiency of Data Transfer in Dataflow Architectures

机译:优化数据流架构中的数据传输效率

获取原文

摘要

Dataflow architectures have been proposed in response to several emerging problems in processor design, such as computational efficiency, design complexity and power efficiency. A dataflow architecture is composed of multiple processing elements (PEs) that are organized into the form of grid. The instructions are compiled by compiler and are explicitly mapped to the PE grid using the instruction placement algorithm. The instruction placement algorithms generally take load balancing, low communication delay, and resource contention as input conditions, but most of them ignore the router congestion of the dataflow network. By applying three kinds of typical instruction placement algorithms on the dataflow model, we found that the dynamic packets that are transferred through the dataflow network is not evenly distributed. Due to the reason that dataflow network usually adopts the uniform structure for each router, congestion is susceptible to occur in the directions dealing with larger load. Partial congestion of the network will degrade the transfer efficiency of the dataflow network, which directly affects the execution efficiency of the dataflow processor. In order to optimize the transfer efficiency of routers and execution efficiency of dataflow processor under unevenly distributed network load, we proposed a cost-efficient hardware mechanism to dynamically detect the imbalances in different directions in each router and adaptively reallocate resources in the bottleneck router. The proposed mechanism is transparent to compiler and instruction placement algorithms. Besides, it can be easily applied as a supplementary hardware optimization to any instruction placement algorithm causing such kind of partial congestion problem. We evaluated the proposed hardware mechanism on a dataflow model, and the results show that our mechanism increases the average computational performance by 15.9%, with an increase in the average utilization of functional units by 15.6%. Crucially, our approach results in relatively small increase in the area and power consumption of less than 1%. In conclusion, the evaluation results suggest that our approach is an effective improvement for the efficiency of data transfer in dataflow processors.
机译:响应于处理器设计中出现的一些新问题,例如计算效率,设计复杂度和功率效率,提出了数据流架构。数据流体系结构由组织成网格形式的多个处理元素(PE)组成。指令由编译器编译,并使用指令放置算法显式映射到PE网格。指令放置算法通常将负载平衡,低通信延迟和资源争用作为输入条件,但大多数都忽略了数据流网络的路由器拥塞。通过在数据流模型上应用三种典型的指令放置算法,我们发现通过数据流网络传输的动态数据包不是均匀分布的。由于数据流网络通常对每个路由器采用统一的结构,因此在处理较大负载的方向上容易发生拥塞。网络的局部拥塞将降低数据流网络的传输效率,这直接影响数据流处理器的执行效率。为了优化在网络负载不均的情况下路由器的传输效率和数据流处理器的执行效率,我们提出了一种经济高效的硬件机制,可以动态检测每个路由器不同方向的不平衡并自适应地在瓶颈路由器中重新分配资源。所提出的机制对编译器和指令放置算法是透明的。此外,它可以容易地作为补充硬件优化应用到引起这种局部拥塞问题的任何指令放置算法。我们在数据流模型上评估了提出的硬件机制,结果表明我们的机制将平均计算性能提高了15.9%,功能单元的平均利用率提高了15.6%。至关重要的是,我们的方法导致相对较小的面积增加,功耗不到1%。总之,评估结果表明我们的方法是对数据流处理器中数据传输效率的有效改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号