首页> 外文期刊>Frontiers of computer science in China >Efficient query processing framework for big data warehouse: an almost join-free approach
【24h】

Efficient query processing framework for big data warehouse: an almost join-free approach

机译:大数据仓库的高效查询处理框架:一种几乎免连接的方法

获取原文
获取原文并翻译 | 示例
           

摘要

The rapidly increasing scale of data warehouses is challenging today's data analytical technologies. A conventional data analytical platform processes data warehouse queries using a star schema - it normalizes the data into a fact table and a number of dimension tables, and during query processing it selectively joins the tables according to users' demands. This model is space economical. However, it faces two problems when applied to big data. First, join is an expensive operation, which prohibits a parallel database or a MapReduce-based system from achieving efficiency and scalability simultaneously. Second, join operations have to be executed repeatedly, while numerous join results can actually be reused by different queries. In this paper, we propose a new query processing framework for data warehouses. It pushes the join operations partially to the pre-processing phase and partially to the postprocessing phase, so that data warehouse queries can be transformed into massive parallelized filter-aggregation operations on the fact table. In contrast to the conventional query processing models, our approach is efficient, scalable and stable despite of the large number of tables involved in the join. It is especially suitable for a large-scale parallel data warehouse. Our empirical evaluation on Hadoop shows that our framework exhibits linear scalability and outperforms some existing approaches by an order of magnitude.
机译:数据仓库规模的快速增长正在挑战当今的数据分析技术。传统的数据分析平台使用星型模式处理数据仓库查询-将数据归一化为事实表和多个维度表,并且在查询处理期间,它会根据用户的需求选择性地联接这些表。这种模型是节省空间的。但是,将其应用于大数据时面临两个问题。首先,join是一项昂贵的操作,它禁止并行数据库或基于MapReduce的系统同时实现效率和可伸缩性。其次,联接操作必须重复执行,而许多联接结果实际上可以由不同的查询重用。在本文中,我们提出了一种新的数据仓库查询处理框架。它将联接操作部分推入预处理阶段,部分推入后处理阶段,以便可以将数据仓库查询转换为事实表上的大规模并行过滤器聚合操作。与常规查询处理模型相比,尽管联接中涉及大量表,但我们的方法高效,可扩展且稳定。特别适用于大型并行数据仓库。我们对Hadoop的经验评估表明,我们的框架具有线性可伸缩性,并且在性能上优于某些现有方法。

著录项

  • 来源
    《Frontiers of computer science in China》 |2015年第2期|224-236|共13页
  • 作者单位

    DEKE Lab (Renmin University of China), Beijing 100872, China,School of Information, Renmin University of China, Beijing 100872, China,School of Computing, National University of Singapore, Singapore 117417, Singapore;

    DEKE Lab (Renmin University of China), Beijing 100872, China,School of Information, Renmin University of China, Beijing 100872, China;

    DEKE Lab (Renmin University of China), Beijing 100872, China;

    DEKE Lab (Renmin University of China), Beijing 100872, China,School of Information, Renmin University of China, Beijing 100872, China;

    DEKE Lab (Renmin University of China), Beijing 100872, China;

    DEKE Lab (Renmin University of China), Beijing 100872, China,School of Information, Renmin University of China, Beijing 100872, China;

    DEKE Lab (Renmin University of China), Beijing 100872, China,School of Information, Renmin University of China, Beijing 100872, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    data warehouse; large scale; TAMP; join-free; multi-version schema;

    机译:数据仓库;大规模TAMP;免费参加;多版本架构;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号