首页> 外文会议>International conference on management of data >Latent OLAP: Data Cubes over Latent Variables
【24h】

Latent OLAP: Data Cubes over Latent Variables

机译:潜在的OLAP:潜在变量上的数据多维数据集

获取原文

摘要

We introduce a novel class of data cube, called latent-variable cube. For many data analysis tasks, data in a database can be represented as points in a multi-dimensional space. Ordinary data cubes compute aggregate functions over these "observed" data points for each cell (i.e., region) in the space, where the cells have different granularities defined by hierarchies. While useful, data cubes do not provide .sufficient capability for analyzing "latent variables'' that are often of interest but not directly observed in data. For example, when analyzing users' interaction with online advertisements, observed data informs whether a user clicked an ad or not. However, the real interest is often in knowing the click probabilities of ads for different user populations. In this example, click probabilities are latent variables that are not observed but have to be estimated from data. We argue that latent variables are a useful construct for a number of OLAP application scenarios. To facilitate such analyses, we propose cubes that compute aggregate functions over latent variables. Specifically, we discuss the pitfalls of common practice in scenarios where latent variables should, but are not considered; we rigorously define latent-variable cube based on Bayesian hierarchical models and provide efficient algorithms. Through extensive experiments on both simulated and real data, we show that our method is accurate and runs orders of magnitude faster than the baseline.
机译:我们介绍了一种新颖的数据类别,称为潜在变量立方体。对于许多数据分析任务,数据库中的数据可以表示为多维空间中的点。普通数据立方体在空间中的每个小区(即,区域)的这些“观察到的”数据点上计算聚合函数,其中小区具有由层次结构定义的不同粒度。虽然有用,但数据多维数据集没有提供。放平性能,用于分析通常感兴趣但不直接在数据中直接观察到的“潜在变量”的能力。例如,当分析用户与在线广告的交互时,观察到的数据通知用户是否单击了一个或者广告。然而,真正的兴趣通常是了解不同用户群体的广告的概率。在此示例中,单击概率是未观察到的潜在变量,但必须从数据估计。我们认为潜在的变量是潜在的变量有用构造了许多OLAP应用方案。为了促进这种分析,我们提出了计算潜在变量上的聚合函数的立方体。具体而言,我们讨论了潜在变量应该,但不考虑的情景中的常见做法的陷阱;我们严格考虑根据贝叶斯分层模型定义潜在的变量立方体,提供高效的算法。通过Simu的大量实验Lated和Real Data,我们表明我们的方法准确,并且比基线更快地运行数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号