首页> 外国专利> Highly parallel computer architecture employing crossbar switch with selectable pipeline delay

Highly parallel computer architecture employing crossbar switch with selectable pipeline delay

机译:高度并行的计算机架构,采用交叉开关和可选的流水线延迟

摘要

A crossbar switch which connects N (N=2.sup.k ; k=0, 1, 2, 3) coarse grain processing elements (rated at 20 million floating point operations per second) to a plurality of memories provides for a parallel processing system free of memory conflicts over a wide range of arithmetic computations (i.e. scalar, vector and matrix). The configuration of the crossbar switch, i.e., the connection between each processing element unit and each parallel memory module, may be changed dynamically on a cycle-by-cycle basis in accordance with the requirements of the algorithm under execution. Although there are certain crossbar usage rules which must be obeyed, the data is mapped over parallel memory such that the processing element units can access and operate on input streams of data in a highly parallel fashion with an effective memory transfer rate and computational throughput power comparable in performance to present-day supercomputers. The crossbar switch is comprised of two basic sections; a multiplexer and a control section. The multiplexer provides the actual switching of signal paths, i.e. connects each processing element unit to a particular parallel memory on each clock cycle (100 nsec). The control section determines which connections are made on each clock cycle in accordance with the algorithm under execution. Selectable pipelined delay in the control section provides for optimal data transfer efficiency between the processors and memory modules over a wide range of array processing algorithms. The crossbar switch also provides for graceful system degradation in computational throughput power without the need to download a new program.
机译:将N(N = 2.k; k = 0、1、2、3)个粗粒处理元素(额定每秒2000万个浮点运算)连接到多个存储器的纵横开关提供并行处理没有内存冲突的系统在各种算术计算(即标量,向量和矩阵)中发生冲突。纵横开关的配置,即每个处理元件单元和每个并行存储模块之间的连接,可以根据正在执行的算法的要求逐周期地动态改变。尽管必须遵守某些交叉开关使用规则,但是数据被映射到并行存储器上,以便处理元件单元可以以高度并行的方式访问和操作输入数据流,并且具有有效的存储器传输速率和计算吞吐量。在性能上胜过当今的超级计算机。纵横开关由两个基本部分组成:多路复用器和控制部分。多路复用器提供信号路径的实际切换,即在每个时钟周期(100纳秒)上将每个处理元件单元连接到特定的并行存储器。控制部分根据正在执行的算法确定在每个时钟周期建立哪些连接。控制部分中可选的流水线延迟可在各种阵列处理算法中提供处理器与内存模块之间的最佳数据传输效率。纵横开关还无需下载新程序,就可以在计算吞吐量上适度降低系统性能。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号