【24h】

Introducing Skew into the TPC-H Benchmark

机译:在TPC-H基准测试中引入偏斜

获取原文
获取原文并翻译 | 示例

摘要

While uniform data distributions were a design choice for the TPC-D benchmark and its successor TPC-H, it has been universally recognized that data skew is prevalent in data warehousing. A modern benchmark should therefore provide a test bed to evaluate the ability of database engines to handle skew. This paper introduces a concrete and practical way to introduce skew in the TPC-H data model by modifying the customer and supplier tables to reflect non-uniform customer and supplier populations. The first proposal consists in defining customer and supplier populations by nation that are roughly proportional to the actual nation populations. In a second proposal, nations are divided into two groups, one with large and equal populations and the other with equal and small populations. We then experiment with the proposed skew models to show how the optimizer of a parallel system can recognize skew and potentially produce different plans depending on the presence of skew. A comparison is made between query performance with the proposed method vs. the original uniform TPC-H distributions. Finally, an approach is presented to introduce skew into TPC-H with the current query set that is compatible with the current benchmark specification rules and could be implemented today.
机译:虽然统一的数据分发是TPC-D基准测试及其后继TPC-H的设计选择,但人们普遍认为数据偏斜在数据仓库中很普遍。因此,现代基准测试应该提供一个测试平台,以评估数据库引擎处理偏斜的能力。本文通过修改客户和供应商表以反映不均匀的客户和供应商数量,介绍了一种在TPC-H数据模型中引入偏斜的具体可行方法。第一个建议是按国家定义与实际国家人口大致成比例的客户和供应商人口。在第二个提案中,国家分为两组,一组人口大而平等,而另一组人口小而平等。然后,我们对提出的偏斜模型进行实验,以显示并行系统的优化程序如何识别偏斜并根据偏斜的存在可能产生不同的计划。在提出的方法的查询性能与原始均匀TPC-H分布之间进行了比较。最后,提出了一种方法,该方法将具有当前查询集的TPC-H中的时滞引入到当前的基准规范规则中,并且可以在今天实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号