A Comparison of Selectivity Estimators for Range Queries on Metric Attributes

机译：度量标准属性范围查询的选择性估计比较

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present a comparison of nonparametric estimation methods for computing approximations of the selectivities of queries, in particular range queries. In contrast to previous studies, the focus of our comparison is on metric attributes with large domains which occur for example in spatial and temporal databases. We also assume that only small sample sets of the required relations are available for estimating the selectivity. In addition to the popular histogram estimators, our comparison includes so-called kernel estimation methods. Although these methods have been proven to be among the most accurate estimators known in statistics, they have not been considered for selectivity estimation of database queries, so far. We first show how to generate kernel estimators that deliver accurate approximate selectivities of queries. Thereafter, we reveal that two parameters, the number of samples and the so-called smoothing parameter, are important for the accuracy of both kernel estimators and histogram estimators. For histogram estimators, the smoothing parameter determines the number of bins (histogram classes). We first present the optimal smoothing parameter as a function of the number of samples and show how to compute approximations of the optimal parameter. Moreover, we propose a new selectivity estimator that can be viewed as an hybrid of histogram and kernel estimators. Experimental results show the performance of different estimators in practice. We found in our experiments that kernel estimators are most efficient for continuously distributed data sets, whereas for our real data sets the hybrid technique is most promising.

机译：在本文中，我们介绍了用于计算查询选择性近似的非参数估计方法的比较，特别是范围查询。与以前的研究相比，我们的比较的重点是具有大域的度量属性，例如在空间和时间数据库中发生。我们还假设只有所需关系的小样本集可用于估计选择性。除了流行的直方图估计器之外，我们的比较包括所谓的内核估计方法。虽然已被证明是统计中已知的最准确的估计方法之一，但到目前为止，他们还没有考虑选择性估算数据库查询。我们首先展示如何生成内核估计，可以提供准确的查询选择性。此后，我们揭示了两个参数，样本数量和所谓的平滑参数，对于内核估计器和直方图估计器的准确性很重要。对于直方图估计器，平滑参数确定箱数（直方图类）。我们首先将最佳平滑参数呈现为样本数量的函数，并显示如何计算最佳参数的近似值。此外，我们提出了一种新的选择性估计，可以被视为直方图和内核估计的混合。实验结果表明了不同估计在实践中的性能。我们在我们的实验中找到了内核估计对于连续分布式数据集最有效，而对于我们的真实数据集，混合技术最有前途。

著录项

来源
《ACM SIGMOD International Conference on Management of Data》|1999年||共12页
会议地点
作者
Bjorn Blohsfeld; Dieter Korus; Bernhard Seeger;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-532;
关键词

相似文献

外文文献
中文文献
专利

1. Selectivity estimators for multidimensional range queries over real attributes [J] . Gunopulos D, Kollios G, Tsotras VJ, The VLDB journal . 2005,第2期

机译：针对真实属性的多维范围查询的选择性估计量
2. Indexing metric uncertain data for range queries and range joins [J] . Chen Lu, Gao Yunjun, Zhong Aoxiao, The VLDB journal . 2017,第4期

机译：为范围查询和范围联接建立指标不确定性数据的索引
3. Query-condition-aware V-optimal histogram in range query selectivity estimation [J] . D.R. AUGUSTYN Bulletin of the Polish Academy of Sciences. Technical Sciences . 2014,第2期

机译：范围查询选择性估计中的查询条件感知V最佳直方图
4. A comparison of selectivity estimators for range queries on metric attributes [C] . Bjorn Blohsfeld, Dieter Korus, Bernhard Seeger ACM SIGMOD international conference on Management of data . 1999

机译：度量属性的范围查询的选择性估计量的比较
5. Metrics-Based Comparison of OWL and XML for Representing and Querying Cognitive Radio Capabilities [D] . ?Chen, Yanji 2020

机译：基于指标的代表和查询认知无线电功能的猫头鹰和XML的比较
6. Comparison of patient specific dose metrics between chest radiography tomosynthesis and CT for adult patients of wide ranging body habitus [O] . Yakun Zhang, Xiang Li, W. Paul Segars, -1

机译：成人广泛体型患者的胸部放射线照相断层合成和CT之间的患者特定剂量指标的比较
7. A comparison of selectivity estimators for range queries on metric attributes [O] . Björn Blohsfeld, Dieter Korus, Bernhard Seeger 1999

机译：度量标准属性范围查询的选择性估计比较

A Comparison of Selectivity Estimators for Range Queries on Metric Attributes

摘要

著录项

相似文献

相关主题

期刊订阅