Improving the data cache performance of multiprocessor operating systems

机译：提高多处理器操作系统的数据缓存性能

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Bus-based shared-memory multiprocessors with coherent caches have recently become very popular. To achieve high performance, these systems rely on increasingly sophisticated cache hierarchies. However, while these machines often run loads with substantial operating system activity, performance measurements have consistently indicated that the operating system uses the data cache hierarchy poorly. In this paper, we address the issue of how to eliminate most of the data cache misses in a multiprocessor operating system while still using off-the-shelf processors. We use a performance monitor to examine traces of a 4-processor machine running four system-intensive loads under UNIX. Based on our observations, we propose hardware and software support that targets block operations, coherence activity, and cache conflicts. For block operations, simple cache bypassing or prefetching schemes are undesirable. Instead, it is best to use a DMA-like scheme that pipelines the data transfer in the bus without involving the processor. Coherence misses are handled with data, privatization and relocation, and the use of updates for a small core of shared variables. Finally, the remaining miss hot spots are handled with data prefetching. Overall, our simulations show that all these optimizations combined eliminate or hide 75% of the operating system data misses in 32-Kbyte primary caches. Furthermore, they speed up the operating system by 19%.

机译：基于总线的共享内存多处理器，具有连贯高速缓存最近变得非常受欢迎。为了实现高性能，这些系统依赖于越来越复杂的缓存层次结构。但是，虽然这些机器经常运行具有大量操作系统活动的负载，但是性能测量一致地表明操作系统使用数据缓存层次结构差。在本文中，我们解决了如何消除多处理器操作系统中的大多数数据缓存未命中的问题，同时仍在使用现成的处理器。我们使用性能监视器检查在UNIX下运行四个系统密集型负载的4处理器机器的痕迹。基于我们的观察，我们提出了针对块操作，一致性活动和缓存冲突的硬件和软件支持。对于块操作，简单的缓存绕过或预取方案是不可取的。相反，最好使用类似DMA的方案，该方案管道在公共汽车中提供数据传输而不涉及处理器。 Coherence Misses通过数据，私有化和重定位处理，以及使用共享变量的小核的使用。最后，剩下的错过的热点是处理数据预取的。总体而言，我们的模拟表明，所有这些优化组合都会消除或隐藏32-Kbyte Primary高速缓存中的75％的操作系统数据未命中。此外，它们将操作系统加速19％。

著录项

来源
《International Symposium on High-Performance Computer Architecture》|1996年||共10页
会议地点
作者
Chun Xia; Torrellas J.; Institute of Electric and Electronic Engineer;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Improving Data Cache Performance with Integrated Use of Split Caches, Victim Cache and Stream Buffers [J] . Afrin Naz, Mehran Rezaei, Krishna Kavi, Computer architecture news . 2005,第3期

机译：通过结合使用拆分缓存，受害者缓存和流缓冲区来提高数据缓存性能
2. A performance study of cache coherence protocols and write caches for parallel-multithreaded shared-memory multiprocessors [J] . Chao-Chin Wu, Cheng Chen Journal of the Chinese Institute of Engineers . 1998,第1期

机译：并行多线程共享内存多处理器的缓存一致性协议和写缓存的性能研究
3. A trace-driven simulator for performance evaluation of cache-based multiprocessor systems [J] . Prete C.A., Prina G. IEEE Transactions on Parallel and Distributed Systems . 1995,第9期

机译：跟踪驱动的模拟器，用于评估基于缓存的多处理器系统的性能
4. Improving the data cache performance of multiprocessor operating systems [C] . Chun Xia, Torrellas, J. . 1996

机译：改善多处理器操作系统的数据缓存性能
5. The effects of cache coherence on the performance of parallel PDE algorithms in multiprocessor systems. [D] . Johnson, Sandra Kay. 1988

机译：高速缓存一致性对多处理器系统中并行PDE算法性能的影响。
6. Software and Hardware Requirements and Trade-Offs in Operating Systems for Wearables: A Tool to Improve Devices’ Performance [O] . Vicente J. P. Amorim, Mateus C. Silva, Ricardo A. R. Oliveira 2019

机译：可穿戴设备操作系统中的软件和硬件要求以及取舍：提高设备性能的工具
7. Improving the Data Cache Performance of Multiprocessor Operating Systems [O] . Chun Xia, Josep Torrellas 1996

机译：提高多处理器操作系统的数据缓存性能

Improving the data cache performance of multiprocessor operating systems

摘要

著录项

相似文献

相关主题

期刊订阅