【24h】

On High Dimensional Skylines

机译:在高维天际线上

获取原文
获取原文并翻译 | 示例

摘要

In many decision-making applications, the skyline query is frequently used to find a set of dominating data points (called skyline points) in a multidimensional dataset. In a high-dimensional space skyline points no longer offer any interesting insights as there are too many of them. In this paper, we introduce a novel metric, called skyline frequency that compares and ranks the interesting-ness of data points based on how often they are returned in the skyline when different number of dimensions (i.e., subspaces) are considered. Intuitively, a point with a high skyline frequency is more interesting as it can be dominated on fewer combinations of the dimensions. Thus, the problem becomes one of finding top-k frequent skyline points. But the algorithms thus far proposed for skyline computation typically do not scale well with dimensionality. Moreover, frequent skyline computation requires that skylines be computed for each of an exponential number of subsets of the dimensions. We present efficient approximate algorithms to address these twin difficulties. Our extensive performance study shows that our approximate algorithm can run fast and compute the correct result on large data sets in high-dimensional spaces.
机译:在许多决策应用程序中,天际线查询通常用于在多维数据集中查找一组主要数据点(称为天际线点)。在高维空间中,天际点不再提供任何有趣的见解,因为它们太多了。在本文中,我们介绍了一种称为``天际线频率''的新颖度量标准,该指标根据考虑了不同数量的维数(即子空间)时它们在天际线中返回的频率来比较和排列数据点的有趣程度。直观地讲,具有较高天际线频率的点更有趣,因为它可以在较少的尺寸组合上占主导地位。因此,该问题成为查找前k个频繁的天际线点之一。但是,迄今为止提出的用于天际线计算的算法通常无法很好地随维数扩展。此外,频繁的天际线计算要求为每个指数子集的维数计算天际线。我们提出了有效的近似算法来解决这些双重难题。我们广泛的性能研究表明,我们的近似算法可以快速运行并在高维空间中的大型数据集上计算正确的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号