首页> 外文期刊>IEEE Transactions on Computers >A highly OR-parallel inference machine (Multi-ASCA) and its performance evaluation: an architecture and its load balancing algorithms
【24h】

A highly OR-parallel inference machine (Multi-ASCA) and its performance evaluation: an architecture and its load balancing algorithms

机译:高度或并行的推理机(Multi-ASCA)及其性能评估:体系结构及其负载平衡算法

获取原文
获取原文并翻译 | 示例

摘要

An architecture and its four load balancing algorithms for a highly OR-parallel inference machine are proposed, and its performance is evaluated in a trace-driven simulation study. This inference machine consists of a large number of processing elements (PEs) with serial I/O links directly connected to each other in a simply modified mesh network. Each PE is a high-speed sequential Prolog processor with its own local memory. The activity of all PEs is locally controlled by four new load balancing algorithms based on purely local communication. Communication is allowed only between directly connected PEs. These load balancing algorithms reduce communication overhead in a load balancing and make it possible to accomplish highly OR-parallel execution. A software simulator using a trace-driven simulation technique based on an inference tree has been developed, and some typical OR-parallel benchmarks such as the n-queens problem have been simulated on it. The average communication per load balancing is reduced by a factor ranging from 1/30 to 1/100 by the interaction of these load balancing algorithms as compared with a conventional copying method. The inference machine (1024 PEs; 32/spl times/32 array) attains 300-600 times parallel speedup, assuming 1 MLIPS (mega logical inferences per second) PE and a 20 MBPS (mega bits per second) each serial I/O link, which could be easily integrated on a single chip using current VLSI technology. This highly OR-parallel inference machine promises to be an important step towards the realization of a high-performance artificial intelligence system.
机译:提出了一种用于高OR并行推理机的体系结构及其四种负载平衡算法,并在跟踪驱动的仿真研究中评估了其性能。该推理机由大量处理元素(PE)组成,这些PE具有在简单修改的​​网状网络中彼此直接连接的串行I / O链接。每个PE是具有自己的本地内存的高速顺序Prolog处理器。所有PE的活动均由基于纯本地通信的四种新负载均衡算法进行本地控制。仅允许直接连接的PE之间进行通信。这些负载平衡算法减少了负载平衡中的通信开销,并可以实现高度“或”并行执行。已经开发了一种使用基于推理树的跟踪驱动模拟技术的软件模拟器,并且已经在其上模拟了一些典型的OR并行基准,例如n皇后问题。与传统的复制方法相比,通过这些负载平衡算法的交互,每个负载平衡的平均通信量减少了1/30到1/100范围。假设每台串行I / O链接为1 MLIPS(每秒兆逻辑推理)PE和20 MBPS(每秒兆位),则推理机(1024个PE; 32 / spl次/ 32阵列)可实现300-600倍的并行加速。 ,可以使用当前的VLSI技术轻松集成到单个芯片上。这种高度“或”并行的推理机有望成为实现高性能人工智能系统的重要一步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号