【24h】

Logical Schema for Data Warehouse on Column-Oriented NoSQL Databases

机译:面向列的NoSQL数据库上的数据仓库的逻辑架构

获取原文

摘要

The column-oriented NoSQL systems propose a flexible and highly denormalized data schema that facilitates data warehouse scalability. However, the implementation process of data warehouses with NoSQL databases is a challenging task as it involves a distributed data management policy on multi-nodes clusters. Indeed, in column-oriented NoSQL systems, the query performances can be improved by a careful data grouping. In this paper, we present a method that uses clustering techniques, in particular k-means, to model the better form of column families, from existing fact and dimensional tables. To validate our method, we adopt TPC-DS data benchmark. We have conducted several experiments to examine the benefits of clustering techniques for the creation of column families in a column-oriented NoSQL HBase database on Hadoop platform. Our experiments suggest that defining a good data grouping on HBase database during the implementation of a data warehouse increases significantly the performance of the decisional queries.
机译:面向列的NoSQL系统提出了一种灵活且高度非规范化的数据模式,可促进数据仓库的可伸缩性。但是,使用NoSQL数据库的数据仓库的实现过程是一项艰巨的任务,因为它涉及多节点群集上的分布式数据管理策略。实际上,在面向列的NoSQL系统中,可以通过仔细的数据分组来提高查询性能。在本文中,我们提出了一种使用聚类技术(尤其是k均值)的方法,以根据现有事实和维表对列族的更好形式进行建模。为了验证我们的方法,我们采用了TPC-DS数据基准测试。我们进行了一些实验,以检验集群技术在Hadoop平台上面向列的NoSQL HBase数据库中创建列族的好处。我们的实验表明,在数据仓库实施期间在HBase数据库上定义良好的数据分组会显着提高决策查询的性能。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号