首页> 外文学位 >Using hardware monitors to automatically improve memory performance.
【24h】

Using hardware monitors to automatically improve memory performance.

机译:使用硬件监视器自动提高内存性能。

获取原文
获取原文并翻译 | 示例

摘要

In this thesis, we propose and evaluate several techniques to dynamically increase the memory access locality of scientific and Java server applications running on cache-coherent non-uniform memory access(cc-NUMA) servers. We first introduce a user-level online page migration scheme where applications are profiled using hardware monitors to determine the preferred locations of the memory pages. The pages are then migrated to memory units via system calls. In our approach, both profiling and page migrations are conducted online while the application runs. We also investigate the use of several potential sources of profiles gathered from hardware monitors in dynamic page migration and compare their effectiveness to using profiles from centralized hardware monitors. In particular, we evaluate using profiles from on-chip CPU monitors, valid TLB content and a hypothetical hardware feature.; We also introduce a set of techniques to both measure and optimize the memory access locality in Java server applications running on cc-NUMA servers. In particular, we propose the use of several NUMA-aware Java heap layouts for initial object allocation and use of dynamic object migration during garbage collection to move objects local to the processors accessing them most. To evaluate these techniques, we also introduce a new hybrid simulation approach to simulate memory behavior of parallel applications based on gathering a partial trace of memory accesses from hardware monitors during an actual run of an application and extrapolating it to a representative full trace.; Our dynamic page migration approach achieved reductions up to 90% in the number of non-local accesses, which resulted in up to a 16% performance improvement. Our results demonstrated that the combinations of inexpensive hardware monitors and a simple migration policy can be effectively used to improve the performance of real scientific applications. Our simulation study demonstrated that cache miss profiles gathered from on-chip hardware monitors, which are typically available in current micro-processors, can be effectively used to guide dynamic page migrations in an application. Our NUMA-aware heap layouts reduced the total number of non-local object accesses in SPECjbb2000 up to 41%, which resulted in up to a 40% reduction in the memory wait time of the workload.
机译:在本文中,我们提出并评估了几种技术,以动态增加在高速缓存一致性非均匀内存访问(cc-NUMA)服务器上运行的科学和Java服务器应用程序的内存访问局部性。我们首先介绍一种用户级的在线页面迁移方案,其中使用硬件监视器对应用程序进行概要分析,以确定内存页面的首选位置。然后通过系统调用将页面迁移到存储单元。在我们的方法中,分析和页面迁移都是在应用程序运行时在线进行的。我们还研究了在动态页面迁移中从硬件监视器收集的配置文件的几种潜在来源的使用,并将其与使用集中式硬件监视器的配置文件进行比较。特别是,我们使用片上CPU监视器的配置文件,有效的TLB内容和假设的硬件功能进行评估。我们还介绍了一套技术,可以测量和优化在cc-NUMA服务器上运行的Java服务器应用程序中的内存访问位置。特别是,我们建议使用几种支持NUMA的Java堆布局进行初始对象分配,并在垃圾回收期间使用动态对象迁移来将本地对象移动到最访问它们的处理器中。为了评估这些技术,我们还引入了一种新的混合仿真方法,该方法基于在应用程序的实际运行过程中从硬件监视器收集部分内存访问跟踪,并将其外推到有代表性的完整跟踪,从而模拟并行应用程序的内存行为。我们的动态页面迁移方法使非本地访问的数量减少了90%,从而使性能提高了16%。我们的结果表明,廉价的硬件监控器和简单的迁移策略的组合可以有效地用于提高实际科学应用程序的性能。我们的模拟研究表明,从片上硬件监视器收集的高速缓存未命中配置文件(通常在当前微处理器中可用)可以有效地用于指导应用程序中的动态页面迁移。我们的NUMA感知堆布局将SPECjbb2000中非本地对象访问的总数减少了41%,从而使工作负载的内存等待时间减少了40%。

著录项

  • 作者

    Tikir, Mustafa M.;

  • 作者单位

    University of Maryland, College Park.;

  • 授予单位 University of Maryland, College Park.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 149 p.
  • 总页数 149
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号