A major bottleneck in implementing sampling as a primitive relational operation is the inefficiency of sampling the output of a query. It is not even known whether it is possible to generate a sample of a join tree without first evaluating the join tree completely. We undertake a detailed study of this problem and attempt to analyze it in a variety of settings. We present theoretical results explaining the difficulty of this problem and setting limits on the efficiency that can be achieved. Based on new insights into the interaction between join and sampling, we develop join sampling techniques for the settings where our negative results do not apply. Our new sampling algorithms are significantly more efficient than those known earlier. We present experimental evaluation of our techniques on Microsoft's SQL Server 7.0.
展开▼
机译:实施采样作为原始关系操作的主要瓶颈是对查询输出进行采样的效率。甚至不知道是否可以生成连接树的样本,而无需完全评估连接树。我们对此问题进行了详细研究,并试图在各种环境中分析它。我们提出了理论结果,解释了这个问题的难度和确定可以实现的效率的限制。基于新的见解进入加入和采样之间的交互,我们开发加入采样技术,了解我们的负面结果不适用的设置。我们的新采样算法明显比前面已知的效率更高。我们对Microsoft的SQL Server 7.0提供了我们技术的实验评估。
展开▼