首页> 外文期刊>Data & Knowledge Engineering >An empirical study on selective partitioning dimensions for partition-based similarity joins
【24h】

An empirical study on selective partitioning dimensions for partition-based similarity joins

机译:基于分区的相似性联接的选择性分区维度的实证研究

获取原文
获取原文并翻译 | 示例

摘要

Real-world application data are usually distributed sparsely and non-uniformly in the high dimensional space that is huge in size. Hence, selection of effective partitioning dimensions is crucial for partition-based similarity joins. In this paper, we present two data partitioning algorithms for evaluations. PerDimSelect selects some dimension axes from the original perpendicular dimension axes pool, and maps each data point into the reduced dimension space. DiaDimSelect creates one-dimensional axis by combining some of original perpendicular dimensions, and maps each data point into the newly-created dimension. In the experiments, several measures are used to compare the performances of the algorithms including CPU cost, total response time, number of created buckets. In conclusion, DiaDimSelect shows better performance than PerDimSelect, for it creates much less partition buckets with the increasing number of partitioning dimensions, which leads to keep the IO cost less expensive while decreasing CPU cost considerably.
机译:现实世界中的应用程序数据通常稀疏且不均匀地分布在巨大的高维空间中。因此,有效分区尺寸的选择对于基于分区的相似性联接至关重要。在本文中,我们提出了两种用于评估的数据分区算法。 PerDimSelect从原始垂直尺寸轴池中选择一些尺寸轴,并将每个数据点映射到缩小的尺寸空间中。 DiaDimSelect通过组合一些原始的垂直尺寸来创建一维轴,并将每个数据点映射到新创建的尺寸中。在实验中,使用了几种措施来比较算法的性能,包括CPU成本,总响应时间,创建的存储桶数。总之,DiaDimSelect显示出比PerDimSelect更好的性能,因为随着分区维数的增加,它创建的分区存储桶要少得多,这可以使IO成本降低,同时大大降低CPU成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号