Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm, in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analysis to provide significantly improved bounds on the amount of sampling necessary for a given level of accuracy. Next, we provide "sanity bounds" to deal with queries for which the underlying data is extremely skewed or the query result is very small. Finally, we report on the performance of the estimation algorithm as implemented in a host language on a commercial relational system. The results are encouraging, even with this loose coupling between the estimation algorithm and the DBMS.
最近,我们提出了一种用于一般查询大小估计的自适应随机抽样算法。在较早的工作中,我们分析了算法的渐近效率和准确性,在本文中,我们研究了该算法在选择和联接中的实用性。首先,我们扩展了先前的分析,以提供给定精度水平所需的采样量的显着改善的界限。接下来,我们提供“理智界限”来处理基础数据极度偏斜或查询结果非常小的查询。最后,我们报告了在商业关系系统上以宿主语言实现的估算算法的性能。即使估计算法与DBMS之间存在松散耦合,结果还是令人鼓舞的。 P>
机译:ARSAC:通过自适应排名的样本共识进行有效的模型估计
机译:对自适应样本量重新估计的客观重新评估:“验证性自适应设计的二十五年”评论
机译:欧洲癌症研究与治疗组织(EORTC)的自适应设计,着重于基于中期效应量的自适应样本量重新估计
机译:空间数据库中选择性估计的自适应采样
机译:自主海洋感测:一种实用的建模,自适应增白和位置估计方法。
机译:自适应性两阶段生物等效性试验包括早期停止和样本大小重新估算
机译:自适应采样的实用选择性估计
机译:基于Neyman-pearson假设检验和谱估计工具的多模型自适应估计的实际实现