首页> 外文期刊>Concurrency, practice and experience >Adaptive cache policy scheduling for big data applications on distributed tiered storage system
【24h】

Adaptive cache policy scheduling for big data applications on distributed tiered storage system

机译:分布式分层存储系统上大数据应用程序的自适应缓存策略调度

获取原文
获取原文并翻译 | 示例

摘要

Multitiered storage systems, which are made up of heterogeneous devices, are widely usedin distributed environments to accelerate the I/O performance of upper big data applications.It raises new challenges in efficient data migration through smart caching mechanisms amongheterogeneous storage levels, such asMEM-SSD-HDD. To optimize the cache policy schedulingmechanism on the distributed tiered storage architecture, we proposed a general frameworkwith five layers, including a tiered storage system layer, a cache migration policy layer, a cachepolicy adaptive scheduling layer, a data access pattern layer, and a big data application layer. Theframework prototype has been designed and implemented on the widely used distributed hybridstorage system named Alluxio. To meet the demands of the big data application layer, on the onehand, we designed a couple of cache eviction policies and promotion policies covering variousaccess patterns on the cache migration policy layer (several proposed eviction policies havebeen adopted by the Alluxio open-source community). On the other hand, two adaptive cachepolicy scheduling algorithms for selecting appropriate policies in various scenarios are designedand implemented on the cache policy adaptive scheduling layer. The scheduling algorithms aredesigned based on the hit ratio statistics and data access pattern model prediction, respectively.Experimental results show that the proposed cache policies are very effective for various bigdata applications, such as Spark SQL. The proposed cache policy scheduling algorithms withvarious eviction policies can improve around 20% hit ratio than that with a single eviction policy.
机译:由异构设备组成的多层存储系统被广泛使用 r n在分布式环境中,以加快较高的大数据应用程序的I / O性能。 r n通过智能缓存机制之间的高效数据迁移提出了新的挑战 r 异构存储级别,例如MEM-SSD-HDD。为了优化分布式分层存储体系结构上的缓存策略调度,我们提出了一个通用框架,该框架具有五层,包括分层存储系统层,缓存迁移策略层,缓存 r n策略自适应调度层,数据访问模式层和大数据应用程序层。框架原型已在名为Alluxio的广泛使用的分布式混合存储系统上设计和实现。为了满足大数据应用层的需求,我们一方面设计了两个缓存移出策略和升级策略,它们涵盖了缓存迁移策略层上的各种 r n访问模式(有几种建议的逐出策略已经 r n被Alluxio开源社区采用)。另一方面,在缓存策略自适应调度层上设计并实现了两种用于在各种情况下选择合适策略的自适应缓存策略调度算法。分别基于命中率统计和数据访问模式模型预测来设计调度算法。 r n实验结果表明,所提出的缓存策略对于Spark SQL等各种大型数据应用非常有效。所提出的具有各种驱逐策略的缓存策略调度算法可以比具有单个驱逐策略的命中率提高约20%。

著录项

  • 来源
    《Concurrency, practice and experience》 |2019年第15期|e5138.1-e5138.25|共25页
  • 作者单位

    State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China Collaborative Innovation Center of NovelSoftware Technology and Industrialization, Nanjing University, Nanjing, China;

    State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China Collaborative Innovation Center of NovelSoftware Technology and Industrialization, Nanjing University, Nanjing, China;

    State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China Collaborative Innovation Center of NovelSoftware Technology and Industrialization, Nanjing University, Nanjing, China;

    State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China Collaborative Innovation Center of NovelSoftware Technology and Industrialization, Nanjing University, Nanjing, China;

    State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China Collaborative Innovation Center of NovelSoftware Technology and Industrialization, Nanjing University, Nanjing, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    adaptive scheduling; cache framework; eviction policy; promotion policy; tiered storage;

    机译:自适应调度;缓存框架;驱逐政策;促销政策;分层存储;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号