首页> 外文OA文献 >Chabok: a Map-Reduce based method to solve data warehouse problems
【2h】

Chabok: a Map-Reduce based method to solve data warehouse problems

机译:Chabok:一种基于地图 - 基于方法解决数据仓库问题

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Abstract Currently, immense quantities of data cannot be managed by traditional database management systems. Instead, they must be managed by big data solutions using shared nothing architectures. Data warehouse systems are systems that address very large amounts of information. The most prominent data warehouse model is star schema, which consists of a fact table and some number of dimension tables. It is necessary to join the facts and dimensions for query executions on the data warehouse. In shared nothing architecture, all of the required information is not placed on a single node so it is necessary to retrieve information from other nodes, which causes network congestion and low speeds of query execution. To avoid this problem and achieve maximum parallelism, dimensions can be replicated over nodes if they are not too large. However, if there are dimensions with data volumes greater than the capacity of a node or dimensions where the data volume summation exceeds node capacity, the query execution is confronted with serious problems. In big data problems, the amount of data is immense, and thus replicating immense data cannot be considered an appropriate method. In this paper, we propose a method called Chabok, which uses two-phased Map-Reduce to solve the data warehouse problem. In this method, aggregation is performed completely on Mappers, and intermediate results are sent to the Reducer. Chabok does not need data replication for join omission. The proposed method was implemented on Hadoop, and TPC-DS queries were executed for benchmarking. The query execution time on Chabok surpassed prominent big data products for data warehousing.
机译:摘要目前,传统数据库管理系统无法管理巨大的数据。相反,它们必须由大数据解决方案管理使用Shared Inhanchaluple。数据仓库系统是解决了非常大量信息的系统。最突出的数据仓库模型是Star Schema,它包括一个事实表和一些数量的尺寸表。有必要加入数据仓库上查询执行的事实和维度。在共享的任何内容中,所有所需信息都不放置在单个节点上,因此必须从其他节点检索信息,这导致网络拥塞和低速度的查询执行。为避免此问题并实现最大的并行性,如果它们不是太大,可以通过节点复制尺寸。但是,如果存在数据卷的尺寸大于数据量求和超过节点容量的节点或尺寸的容量,则查询执行面临严重问题。在大数据问题中,数据量是巨大的,因此复制巨大数据不能被认为是合适的方法。在本文中,我们提出了一种称为Chabok的方法,它使用双相位映射减少来解决数据仓库问题。在此方法中,聚合在映射器上完全执行,并且将中间结果发送到减速器。 Chabok不需要连接遗漏的数据复制。所提出的方法在Hadoop上实施,并且对基准测试执行TPC-DS查询。 Chabok上的查询执行时间超过了数据仓库的突出大数据产品。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号