首页> 外文期刊>Ecological restoration >A cost-based storage format selector for materialized results in big data frameworks
【24h】

A cost-based storage format selector for materialized results in big data frameworks

机译:基于成本的存储格式选择器,用于大数据框架中的物化结果

获取原文
获取原文并翻译 | 示例
       

摘要

Modern big data frameworks (such as Hadoop and Spark) allow multiple users to do large-scale analysis simultaneously, by deploying data-intensive workflows (DIWs). These DIWs of different users share many common tasks (i.e, 50-80%), which can be materialized and reused in future executions. Materializing the output of such common tasks improves the overall processing time of DIWs and also saves computational resources. Current solutions for materialization store data on Distributed File Systems by using a fixed storage format. However, a fixed choice is not the optimal one for every situation. Specifically, different layouts (i.e., horizontal, vertical or hybrid) have a huge impact on execution, according to the access patterns of the subsequent operations. In this paper, we present a cost-based approach that helps deciding the most appropriate storage format in every situation. A generic cost-based framework that selects the best format by considering the three main layouts is presented. Then, we use our framework to instantiate cost models for specific Hadoop storage formats (namely SequenceFile, Avro and Parquet), and test it with two standard benchmark suits. Our solution gives on average 1.33x speedup over fixed SequenceFile, 1.11x speedup over fixed Avro, 1.32x speedup over fixed Parquet, and overall, it provides 1.25x speedup.
机译:现代大数据框架(如Hadoop和Spark)允许多个用户同时进行大规模分析,通过部署数据密集型工作流(DIWS)。这些不同用户的DIWS共享许多常见任务(即50-80%),可以在未来的执行中实现并重复使用。实现此类常见任务的输出可提高DIWS的整体处理时间,并节省计算资源。通过使用固定的存储格式,对分布式文件系统的实际解决方案存储数据。但是,对于每种情况,固定选择不是最佳的选择。具体而言,根据随后的操作的访问模式,不同的布局(即水平,垂直或混合动力器)对执行具有巨大的影响。在本文中,我们提出了一种基于成本的方法,有助于在每种情况下决定最合适的存储格式。提出了一种通过考虑三个主要布局来选择最佳格式的基于泛型的基于成本格式的框架。然后,我们使用我们的框架来实例化特定Hadoop存储格式的成本模型(即SequenceFile,Avro和Parquet),并用两个标准的基准套装测试。我们的解决方案平均为1.33X加速固定序列文件,1.11x超速固定的Avro,1.32倍的加速固定地板,总体而言,它提供1.25倍的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利