首页> 外文学位 >A study of simulation and verification of a many-core architecture on two modern reconfigurable platforms.
【24h】

A study of simulation and verification of a many-core architecture on two modern reconfigurable platforms.

机译:在两个现代可重配置平台上对多核体系结构进行仿真和验证的研究。

获取原文
获取原文并翻译 | 示例

摘要

Recent advancements in computer performance have been hindered by the physical limitations of the current state-of-the-art semiconductor manufacturing technology. Steady performance growth, by means of increasing the operational frequency, is not possible any longer.; On the one hand we are "Hitting the Memory Wall"[1]: We need to increase the cache size to reduce the probability of cache misses. With the increased cache size and resulting transistor count on the other hand, we increase static and dynamic current leaks[2]. This results in an exponential growth of power consumption.; To keep up with the steady demand of increased performance, a paradigm shift towards multicore and many-core computer architecture designs has been made by the major microprocessor manufacturers.; This trend is going as far as integrating a very large number of simple processors onto a single die. This type of architecture is excellent for high-performance acceleration of domain-specific tasks. To achieve the best possible results, these accelerator platforms should be coupled with general-purpose microprocessors, which can take over the burden of running the operating system. One should note that the recent advancements in GPGPU technology along with steadily growing FPGA performance present other pathways of creating alternative acceleration platforms.; The IBM Cyclops64 Chip is part of a Petaflop class supercomputer architecture. This chip is a multicore architecture with a very large number of execution cores, memory banks and other components integrated on a single die. Each of these chip components are interconnected via the C64 Crossbar Switch, an efficient interconnection network. Simulation of such an interconnection network is a very important task throughout the design and implementation process.; This thesis describes the design, implementation, and experimentation with an environment that may be used for acceleration, verification and validation of this interconnection network. In addition to this, a latency accurate Cyclops64 architectural simulator environment has been extended and accelerated.; Under the iterative emulation technology first proposed at CAPSL, named "DIMES"[3], a portion of FPGA resources will be time-shared among several identical modules of the target design and iteratively used to emulate them in multiple steps. The representation of the identical modules in the FPGA consists of (1) a single module copy and (2) a storage block holding all the states of the modules during iterative emulation. With the help of this technology, the Cyclops32[4, 5] chip along with the Cyclops64 Crossbar Switch[6] have been implemented on the AlphaData[7] platform earlier. Additionally, the Cyclops64 chip has been recently fully implemented on the IBM MrsClops[8] Emulation Engine.; Major contributions of this document are: (i) We have ported the Cyclops64 interconnection network logic onto several state-of-the-art FPGA-Coprocessing Accelerator platforms. The increase in emulation speed as well as new logic designs of the Cyclops64 Architecture were the main driving forces for this work. Platforms such as XtremeData[9] XD1000 and DRC Computer[10] DS1000 were used for this work. Working on those novel platforms was a particularly interesting and challenging experience. We had to work on a range of different FPGA devices; we have faced and solved problems associated with bugs in vendor provided user interface logic, documentation and hardware device implementation. Throughout the process, we have provided valuable feedback to the platform designers. The resulting upgrades for future generations of these platforms will benefit from our efforts. (ii) With the use of those FPGA Accelerator platforms and based on the work of Fei Chen on the "LAST"1[11] simulator, we were able to create a new type of computer architecture simulation. By combining software Simulation with hardware Emulation, called the "SEmulator,"2 we were able to im prove the
机译:当前最新的半导体制造技术的物理局限性阻碍了计算机性能的最新发展。通过增加操作频率来实现稳定的性能增长不再是可能的。一方面,我们正在“击中内存墙” [1]:我们需要增加缓存大小以减少缓存未命中的可能性。另一方面,随着高速缓存大小的增加和晶体管数量的增加,我们增加了静态和动态电流泄漏[2]。这导致功耗成指数增长。为了满足不断增长的性能需求,主要的微处理器制造商已向多核和多核计算机体系结构设计模式转变。这种趋势一直到将大量简单处理器集成到单个芯片上。这种类型的体系结构非常适合特定领域任务的高性能加速。为了获得最佳结果,这些加速器平台应与通用微处理器配合使用,这可以承担运行操作系统的负担。应该注意的是,GPGPU技术的最新进展以及FPGA性能的稳步提高为创建替代加速平台提供了其他途径。 IBM Cyclops64芯片是Petaflop类超级计算机体系结构的一部分。该芯片是一种多核体系结构,在单个裸片上集成了非常多的执行内核,存储库和其他组件。这些芯片组件中的每个组件均通过C64交叉开关(一种有效的互连网络)互连。在整个设计和实施过程中,这种互连网络的仿真是一项非常重要的任务。本文介绍了可用于加速,验证和确认此互连网络的环境的设计,实现和实验。除此之外,延迟和精确的Cyclops64体系结构仿真器环境已经得到扩展和加速。在CAPSL首次提出的名为“ DIMES” [3]的迭代仿真技术下,一部分FPGA资源将在目标设计的几个相同模块之间分时共享,并以迭代方式用于多个步骤中对其进行仿真。 FPGA中相同模块的表示形式包括(1)一个模块副本和(2)一个存储块,该存储块在迭代仿真期间保存模块的所有状态。借助这项技术,Cyclops32 [4,5]芯片以及Cyclops64 Crossbar Switch [6]早已在AlphaData [7]平台上实现。此外,Cyclops64芯片最近已在IBM MrsClops [8]仿真引擎上完全实现。该文档的主要贡献是:(i)我们已经将Cyclops64互连网络逻辑移植到了几个最新的FPGA协同处理加速器平台上。仿真速度的提高以及Cyclops64体系结构的新逻辑设计是这项工作的主要推动力。 XtremeData [9] XD1000和DRC Computer [10] DS1000等平台用于这项工作。在这些新颖的平台上工作是一次特别有趣且充满挑战的经历。我们不得不研究各种不同的FPGA器件。我们已经解决了与供应商提供的用户界面逻辑,文档和硬件设备实现中的错误相关的问题。在整个过程中,我们已向平台设计人员提供了宝贵的反馈。我们为这些平台的后代所做的升级将受益于我们的努力。 (ii)通过使用那些FPGA加速器平台,并基于Fei Chen在“ LAST” 1 [11]模拟器上的工作,我们能够创建一种新型的计算机体系结构模拟。通过将软件仿真与称为“ SEmulator”的硬件仿真2相结合,我们能够证明

著录项

  • 作者

    Krepis, Dimitrij.;

  • 作者单位

    University of Delaware.$bDepartment of Electrical and Computer Engineering.;

  • 授予单位 University of Delaware.$bDepartment of Electrical and Computer Engineering.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 M.E.E.
  • 年度 2007
  • 页码 70 p.
  • 总页数 70
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号