首页> 外国专利> Histogram construction using adaptive random sampling with cross-validation for database systems

Histogram construction using adaptive random sampling with cross-validation for database systems

机译:使用交叉验证的自适应随机抽样构建直方图

摘要

Using adaptive random sampling with cross-validation helps determine when enough data of a database has been sampled to construct histograms on one or more columns of one or more tables of the database within a desired or predetermined degree of accuracy. An adaptive random sampling histogram construction tool constructs an approximate equi-height k-histogram using an initial sample of data values from the database and iteratively updates the histogram using an additional sample of data values from the database until the histogram is within the desired degree of accuracy. The accuracy of the histogram is cross-validated against the additional sample at each iteration, and the additional sample is used to update the histogram to help improve its accuracy. The accuracy of the histogram may be measured by an error in distribution of the additional sample over the histogram as compared to a threshold error using a suitable error metric. By attempting to sample only the number of data values necessary to construct the histogram within the desired degree of accuracy, the adaptive random sampling histogram construction tool attempts to avoid any cost increases in time and memory from sampling too many data values.
机译:使用具有交叉验证的自适应随机采样有助于确定何时已采样了足够的数据库数据,以在期望或预定的准确度内在数据库的一个或多个表的一个或多个列上构建直方图。自适应随机采样直方图构造工具使用来自数据库的数据值的初始样本来构建近似等高k直方图,并使用来自数据库的其他数据值的样本迭代更新直方图,直到直方图在期望的程度内准确性。直方图的准确性在每次迭代时都与附加样本进行交叉验证,并且附加样本用于更新直方图以帮助提高其准确性。直方图的准确性可以通过使用合适的误差度量与阈值误差相比,通过附加样本在直方图上的分布误差来测量。通过尝试仅采样在所需精度范围内构建直方图所需的数据值数量,自适应随机采样直方图构建工具尝试避免由于采样太多数据值而在时间和内存方面的任何成本增加。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号