首页> 外文学位 >Memory profiling on shared-memory multiprocessors.
【24h】

Memory profiling on shared-memory multiprocessors.

机译:共享内存多处理器上的内存配置文件。

获取原文
获取原文并翻译 | 示例

摘要

Tuning application memory performance can be difficult on any system but is particularly so on distributed shared-memory (DSM) multiprocessors. This is due to the implicit nature of communication, the unforeseen interactions among the processors, and the long remote memory latencies. Tools, called memory profilers, that allow the user to map memory behavior back to application data structures can be invaluable aids to the programmer. Unfortunately, memory profiling is difficult to implement efficiently since most systems lack the requisite hardware support. This dissertation introduces two techniques for efficient memory profiling, each requiring hardware support on either the processor or the system node controller.; The first technique, called TrapPoint, uses processor support for a trapping cache miss to point out memory bottlenecks. We construct a prototype on the versatile FLASH multiprocessor to study its feasibility. We show that modest processor support can be used to construct a useful memory profiler with acceptable overhead.; The FlashPoint memory profiler uses support on the system node controller to collect similar performance information. The FLASH multiprocessor was designed to allow for instrumentation of the node controller, enabling us to construct a prototype. Since profiling is done in the node controller, FlashPoint has access to more information about the memory traffic, such as cache-coherence events, than a processor-based monitor such as TrapPoint. It is therefore able to collect an extended memory profile.; Although FlashPoint requires more hardware support than TrapPoint, it overcomes many of TrapPoint's shortcomings. The required actions for memory profiling are quite similar to those required for cache coherence, so there are numerous synergies in implementing memory profiling on the same node controller that manages the cache-coherence protocol. Performing memory profiling in the node controller therefore allows a memory profiler to collect more data with lower overhead and higher accuracy than is possible on the processor.; Since memory profiling data can be so valuable and it can be collected with relatively little hardware support, we argue that future DSM multiprocessors should be designed with support for memory profiling. This support is best done in the system node controller, but for implementations where this is infeasible, an acceptable monitor can be implemented with processor support.
机译:在任何系统上,调整应用程序内存的性能都可能很困难,但是在分布式共享内存(DSM)多处理器上尤其如此。这是由于通信的隐含特性,处理器之间无法预料的交互作用以及较长的远程内存等待时间。允许用户将内存行为映射回应用程序数据结构的工具(称为内存探查器)对于程序员而言是非常宝贵的帮助。不幸的是,由于大多数系统缺乏必需的硬件支持,因此内存配置文件难以高效实现。本文介绍了两种有效的内存配置技术,每种技术都需要处理器或系统节点控制器上的硬件支持。第一种技术称为TrapPoint,它使用处理器支持来捕获高速缓存未命中,以指出内存瓶颈。我们在通用的FLASH多处理器上构建原型,以研究其可行性。我们证明了适度的处理器支持可用于以可接受的开销构建有用的内存分析器。 FlashPoint内存探查器使用系统节点控制器上的支持来收集类似的性能信息。 FLASH多处理器的设计允许对节点控制器进行检测,从而使我们能够构建原型。由于概要分析是在节点控制器中完成的,因此与基于处理器的监视器(例如TrapPoint)相比,FlashPoint可以访问有关内存流量的更多信息(例如,缓存一致性事件)。因此,它能够收集扩展的内存配置文件。尽管FlashPoint比TrapPoint需要更多的硬件支持,但它克服了TrapPoint的许多缺点。内存配置文件所需的操作与高速缓存一致性所需的操作非常相似,因此在管理高速缓存一致性协议的同一节点控制器上实现内存配置文件有许多协同作用。因此,在节点控制器中执行内存配置文件可以使内存分析器以比处理器上更低的开销和更高的准确性收集更多数据。由于内存分析数据非常有价值,并且可以在相对较少的硬件支持下进行收集,因此我们认为,未来的DSM多处理器应该在设计时支持内存分析。最好在系统节点控制器中完成此支持,但是对于不可行的实现,可以在处理器支持下实现可接受的监视器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号