Hierarchical clustering for OLAP: the CUBE File approach

Nikos Karayannidis; Timos Sellis

首页> 外文期刊>The VLDB journal >Hierarchical clustering for OLAP: the CUBE File approach

【24h】

Hierarchical clustering for OLAP: the CUBE File approach

机译：OLAP的分层群集：多维数据集文件方法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper deals with the problem of physical clustering of multidimensional data that are organized in hierarchies on disk in a hierarchy-preserving manner. This is called hierarchical clustering. A typical case, where hierarchical clustering is necessary for reducing I/Os during query evaluation, is the most detailed data of an OLAP cube. The presence of hierarchies in the multidimensional space results in an enormous search space for this problem. We propose a representation of the data space that results in a chunk-tree representation of the cube. The model is adaptive to the cube's extensive sparseness and provides efficient access to subsets of data based on hierarchy value combinations. Based on this representation of the search space we formulate the problem as a chunk-to-bucket allocation problem, which is a packing problem as opposed to the linear ordering approach followed in the literature. We propose a metric to evaluate the quality of hierarchical clustering achieved (i.e., evaluate the solutions to the problem) and formulate the problem as an optimization problem. We prove its NP-Hardness and provide an effective solution based on a linear time greedy algorithm. The solution of this problem leads to the construction of the CUBE File data structure. We analyze in depth all steps of the construction and provide solutions for interesting sub-problems arising, such as the formation of bucket-regions, the storage of large data chunks and the caching of the upper nodes (root directory) in main memory. Finally, we provide an extensive experimental evaluation of the CUBE File's adaptability to the data space sparseness as well as to an increasing number of data points. The main result is that the CUBE File is highly adaptive to even the most sparse data spaces and for realistic cases of data point cardinalities provides hierarchical clustering of high quality and significant space savings.

机译：本文解决了多维数据的物理聚类问题，这些多维数据以分层结构的形式保留在磁盘上的分层结构中。这称为层次聚类。 OLAP多维数据集的最详细数据是一种典型情况，其中在查询评估期间减少I / O时必须进行层次结构聚类。多维空间中层次结构的存在会导致针对此问题的巨大搜索空间。我们提出了一种数据空间的表示形式，该数据空间导致了多维数据集的块树表示。该模型适用于多维数据集的广泛稀疏性，并基于层次结构值组合提供对数据子集的有效访问。基于搜索空间的这种表示形式，我们将该问题表述为一个块到桶的分配问题，这是一个打包问题，与文献中遵循的线性排序方法相反。我们提出了一种度量来评估已实现的层次聚类的质量（即评估问题的解决方案），并将该问题表述为优化问题。我们证明了它的NP-Hardness并提供了基于线性时间贪婪算法的有效解决方案。该问题的解决方案导致了CUBE File数据结构的构建。我们深入分析了构建的所有步骤，并为出现的有趣子问题提供了解决方案，例如存储区的形成，大数据块的存储以及主存储器中高层节点（根目录）的缓存。最后，我们对CUBE文件对数据空间稀疏性以及对越来越多的数据点的适应性进行了广泛的实验评估。主要结果是，即使是最稀疏的数据空间，多维数据集文件也具有很高的适应性，并且在实际的数据点基数情况下，可以提供高质量的分层聚类并节省大量空间。

著录项

来源
《The VLDB journal 》 |2008年第4期| p.621-655| 共35页
作者
Nikos Karayannidis; Timos Sellis;
展开▼
作者单位

Institute of Communication and Computer Systems and School of Electrical and Computer Engineering, National Technical University of Athens, Zographou 15773, Athens, Greece;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术 ;
关键词
hierarchical clustering; OLAP; CUBE file; data cube; physical data clustering;

机译：分层聚类;OLAP;CUBE文件;数据立方体;物理数据聚类;

相似文献

外文文献
中文文献
专利

1. HIGH-DIMENSIONAL HIERARCHICAL OLAP : A PREFIX–INDEX HIERARCHICAL CUBING APPROACH [J] . KONGFA HU, ZHE SHENG, LING CHEN Journal of Theoretical and Applied Information Technology . 2013 ,第1期

机译：高维分层OLAP：前缀索引分层立方方法
2. The hierarchical agglomerative clustering with Gower index: A methodology for automatic design of OLAP cube in ecological data processing context [J] . Sautot Lucile, Faivre Bruno, Journaux Ludovic, Ecological informatics: an international journal on ecoinformatics and computational ecology . 2015 ,第Pta2期

机译：具有高尔指数的分层聚集聚类：在生态数据处理环境中自动设计OLAP多维数据集的方法
3. Designing data cubes in OLAP systems: a decision makers' requirements-based approach [J] . Djiroun Rahma, Boukhalfa Kamel, Alimazighi Zaia Cluster computing . 2019 ,第3期

机译：在OLAP系统中设计数据多维数据集：基于决策者的要求的方法
4. CUBE File: A File Structure for Hierarchically Clustered OLAP Cubes [C] . Nikos Karayannidis, Timos Sellis, Yannis Kouvaras International Conference on Extending Database Technology(EDBT 2004); 20040314-20040318; Heraklion; GR . 2004

机译：多维数据集文件：用于分层群集的OLAP多维数据集的文件结构
5. OLAP database computation with a splitcube in a cluster [D] . Zhang, Yongping 2009

机译：在群集中使用splitcube进行OLAP数据库计算
6. Hierarchical Clustering of DNA k-mer Counts in RNAseq Fastq Files Identifies Sample Heterogeneities [O] . Wolfgang Kaisers, Holger Schwender, Heiner Schaal 2018

机译：RNAseq Fastq文件中DNA k-mer计数的分层聚类可识别样品异质性
7. The hierarchical agglomerative clustering with Gower index: a methodology for automatic design of OLAP cube in ecological data processing context. [O] . Sautot, Lucile, Faivre, Bruno, Journaux, Ludovic, 2015

机译：具有Gower指数的分层聚集聚类：一种在生态数据处理环境中自动设计OLAP多维数据集的方法。

Hierarchical clustering for OLAP: the CUBE File approach

摘要

著录项

相似文献

相关主题

期刊订阅