首页> 外文会议>International conference on management of data >Latent OLAP: Data Cubes over Latent Variables
【24h】

Latent OLAP: Data Cubes over Latent Variables

机译:潜在OLAP:潜在变量上的数据多维数据集

获取原文

摘要

We introduce a novel class of data cube, called latent-variable cube. For many data analysis tasks, data in a database can be represented as points in a multi-dimensional space. Ordinary data cubes compute aggregate functions over these "observed" data points for each cell (i.e., region) in the space, where the cells have different granularities defined by hierarchies. While useful, data cubes do not provide .sufficient capability for analyzing "latent variables'' that are often of interest but not directly observed in data. For example, when analyzing users' interaction with online advertisements, observed data informs whether a user clicked an ad or not. However, the real interest is often in knowing the click probabilities of ads for different user populations. In this example, click probabilities are latent variables that are not observed but have to be estimated from data. We argue that latent variables are a useful construct for a number of OLAP application scenarios. To facilitate such analyses, we propose cubes that compute aggregate functions over latent variables. Specifically, we discuss the pitfalls of common practice in scenarios where latent variables should, but are not considered; we rigorously define latent-variable cube based on Bayesian hierarchical models and provide efficient algorithms. Through extensive experiments on both simulated and real data, we show that our method is accurate and runs orders of magnitude faster than the baseline.
机译:我们介绍了一种新型的数据多维数据集,称为潜在变量多维数据集。对于许多数据分析任务,数据库中的数据可以表示为多维空间中的点。普通数据立方体针对空间中每个单元(即区域)在这些“观察到的”数据点上计算聚合函数,其中这些单元具有由层次结构定义的不同粒度。虽然有用,但数据立方体无法提供足够的功能来分析通常不感兴趣但无法直接在数据中观察到的“潜在变量”,例如,在分析用户与在线广告的互动时,观察到的数据会告知用户是否点击了但是,真正的兴趣通常在于了解不同用户群体的广告点击概率。在此示例中,点击概率是潜伏变量,无法观察到,但必须根据数据进行估算。我们认为潜伏变量是为了方便进行此类分析,我们建议使用多维数据集来计算潜在变量的聚合函数,具体来说,我们讨论在应该但不考虑潜在变量的情况下的常见做法的陷阱。通过贝叶斯层次模型定义潜变量立方体并提供有效算法。最新的和真实的数据,我们证明了我们的方法是准确的,并且比基线运行的速度快了几个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号