【24h】

Memory Hierarchy for Web Search

机译:Web搜索的内存层次结构

获取原文
获取外文期刊封面目录资料

摘要

Online data-intensive services, such as search, serve billions of users, utilize millions of cores, and comprise a significant and growing portion of datacenter-scale workloads. However, the complexity of these workloads and their proprietary nature has precluded detailed architectural evaluations and optimizations of processor design trade-offs. We present the first detailed study of the memory hierarchy for the largest commercial search engine today. We use a combination of measurements from longitudinal studies across tens of thousands of deployed servers, systematic microarchitectural evaluation on individual platforms, validated trace-driven simulation, and performance modeling - all driven by production workloads servicing real-world user requests. Our data quantifies significant differences between production search and benchmarks commonly used in the architecture community. We identify the memory hierarchy as an important opportunity for performance optimization, and present new insights pertaining to how search stresses the cache hierarchy, both for instructions and data. We show that, contrary to conventional wisdom, there is significant reuse of data that is not captured by current cache hierarchies, and discuss why this precludes state-of-the-art tiled and scale-out architectures. Based on these insights, we rethink a new cache hierarchy optimized for search that trades off the inefficient use of L3 cache transistors for higher-performance cores, and adds a latency-optimized on-package eDRAM L4 cache. Compared to state-of-the-art processors, our proposed design performs 27% to 38% better.
机译:诸如搜索之类的在线数据密集型服务为数十亿用户提供服务,利用数百万个内核,并且构成了数据中心规模的工作量中越来越大的一部分。但是,这些工作负载的复杂性及其专有性已排除了详细的体系结构评估和处理器设计折衷的优化。我们为当今最大的商业搜索引擎提供了有关内存层次结构的第一个详细研究。我们结合了对成千上万台已部署服务器进行的纵向研究得出的测量结果,在单个平台上进行的系统微体系结构评估,经过验证的跟踪驱动的仿真以及性能建模的方法,所有这些工作均由服务于实际用户需求的生产工作负载驱动。我们的数据量化了产品搜索和体系结构社区中常用的基准之间的显着差异。我们将内存层次结构确定为性能优化的重要机会,并提供有关搜索如何在指令和数据上强调缓存层次结构的新见解。我们证明,与传统观点相反,当前的缓存层次结构无法捕获大量数据重用,并讨论了为什么这排除了最新的平铺和横向扩展体系结构。基于这些见解,我们重新考虑了针对搜索进行优化的新缓存层次结构,以权衡L3缓存晶体管对高性能内核的低效使用,并添加了延迟优化的封装内eDRAM L4缓存。与最先进的处理器相比,我们提出的设计性能提高了27%至38%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号