首页> 外文期刊>International Journal of Computer Systems Science & Engineering >Integrating outlier removal into existing histogram construction methods for geographic data
【24h】

Integrating outlier removal into existing histogram construction methods for geographic data

机译:将离群值消除整合到现有的地理数据直方图构建方法中

获取原文
获取原文并翻译 | 示例

摘要

Histograms have been widely used for estimating selectivity in query optimization. In this paper, we propose a new approach to improve the accuracy of histograms for multi-dimensional geographic data. Our idea is to remove outliers where appropriate in the histogram buckets. Our aim, in removing the outliers, is to increase the uniformity of data distribution in the buckets' areas, and thus enhance the histogram's accuracy. While the two fields, histogram construction and outlier detection, have been extensively investigated, there has been no research work on their integration to improve the accuracy of the histogram. Therefore, we present in this paper why removing outliers is useful for the histograms. Then, we describe a simple, yet effective, algorithm to detect and remove outliers for the histogram buckets. This algorithm is designed especially for histogram buckets and can be integrated easily into existing histogram construction methods. Through extensive experiments using real-life data sets, we show that the proposed approach can enhance the accuracy of existing histogram construction methods by 2 times on average.
机译:直方图已广泛用于估计查询优化中的选择性。在本文中,我们提出了一种新的方法来提高多维地理数据的直方图的准确性。我们的想法是在直方图桶中删除异常值。我们的目的是消除异常值,目的是提高数据桶区域中数据分布的均匀性,从而提高直方图的准确性。虽然已经对直方图构造和离群值检测这两个领域进行了广泛的研究,但还没有关于它们的集成以提高直方图准确性的研究工作。因此,我们在本文中介绍了为什么去除异常值对直方图有用。然后,我们描述一种简单但有效的算法来检测和消除直方图桶的离群值。该算法专为直方图存储桶而设计,可以轻松集成到现有的直方图构造方法中。通过使用实际数据集进行的广泛实验,我们证明了该方法可以将现有直方图构建方法的准确性平均提高2倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号