A plurality of parallel processing elements are connected in a binary tree configuration, with each processing element except those in the highest and lowest levels being in communication with a single parent processing element as well as first and second (or left and right) child processing elements. Each processing element comprises a processor, a read/write or random access memory, and an input/output (I/O) device. The I/O device provides interfacing between each processing element and its parent and children processing elements so as to provide significant improvements in propagation speeds through the binary tree. The I/O device allows the presently preferred embodiment of the invention to be clocked at 12 megahertz, producing in the case of a tree of 1023 processors, each having an average instruction cycle time of 1.8 mu s, a system with a raw computational throughput of approximately 570 million instructions per second. The I/O device communicates data and queries from the root processing element to all other N processing elements in the array in one processor instruction cycle instead of in O(log2N) processor instruction cycles as in prior art binary tree arrays. Primitive queries are executed in parallel by each processing element and the results made available for reporting back to the root processing element. In several important cases, these results can be combined and reported back to the root processing element in a single processor instruction cycle instead of in O(log2N) processor instruction cycles as in prior art binary tree arrays. Thus, the elapsed time for a broadcast and report operation is in effect a constant time regardless of the number of processors in the array.
展开▼
机译:多个并行处理元素以二叉树结构连接,每个处理元素除了最高和最低级别的元素之外,都与单个父处理元素以及第一和第二(或左和右)子处理元素通信。每个处理元件包括处理器,读/写或随机存取存储器以及输入/输出(I / O)设备。 I / O设备在每个处理元件与其父和子处理元件之间提供接口,从而大大改善了通过二叉树的传播速度。该I / O设备允许本发明的当前优选实施例的时钟频率为12MHz,在1023个处理器的树的情况下产生,每个处理器的平均指令周期时间为1.8μs,该系统具有原始计算吞吐量。每秒约5.7亿条指令。 I / O设备在一个处理器指令周期中而不是在现有技术二叉树阵列中的O(log2N)个处理器指令周期中将数据和查询从根处理元素传送到阵列中的所有其他N个处理元素。每个处理元素并行执行原始查询,并使结果可用于报告回根处理元素。在几种重要的情况下,这些结果可以组合并在单个处理器指令周期中报告给根处理元素,而不是像现有技术二叉树阵列那样在O(log2N)个处理器指令周期中报告给根处理元素。因此,广播和报告操作的经过时间实际上是恒定时间,而不管阵列中处理器的数量如何。
展开▼