首页> 外文期刊>Computer architecture news >Dissecting Cyclops: A Detailed Analysis of a Multithreaded Architecture
【24h】

Dissecting Cyclops: A Detailed Analysis of a Multithreaded Architecture

机译:解剖独眼巨人:多线程体系结构的详细分析

获取原文
获取原文并翻译 | 示例

摘要

Multiprocessor systems-on-a-chip offer a structured approach to managing complexity in chip design. Cyclops is a new family of multithreaded architectures which integrates processing logic, main memory and communications hardware on a single chip. Its simple, hierarchical design allows the hardware architect to manage a large number of components to meet the design constraints in terms of performance, power or application domain. This paper evaluates several alternative Cyclops designs with different relative costs and trade-offs. We compare the performance of several scientific kernels running on different configurations of this architecture. We show that by increasing the number of threads sharing a floating point unit we can hide fairly high cache and memory latencies. We prove that we can reach the theoretical peak performance of the chip and we identify the optimal balance of components for each application. We demonstrate that the design is well adapted to solve problems that are difficult to optimize. For example, we show that sparse matrix vector multiplication obtains 16 GFlops out of 32 GFlops of peak performance.
机译:多处理器片上系统提供了一种结构化方法来管理芯片设计中的复杂性。 Cyclops是一个新的多线程体系结构系列,在单个芯片上集成了处理逻辑,主存储器和通信硬件。其简单的分层设计使硬件架构师可以管理大量组件,以满足性能,功耗或应用程序领域的设计约束。本文评估了具有不同相对成本和折衷的几种替代性独眼巨人设计。我们比较了在该体系结构的不同配置上运行的几种科学内核的性能。我们表明,通过增加共享浮点单元的线程数量,我们可以隐藏相当高的缓存和内存延迟。我们证明可以达到芯片的理论峰值性能,并且可以确定每种应用中组件的最佳平衡。我们证明该设计非常适合解决难以优化的问题。例如,我们显示稀疏矩阵向量乘法可从32个GFlop的峰值性能中获得16个GFlop。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号