【24h】

Rethinking Benchmarking for Data

机译:重新思考数据的基准

获取原文

摘要

Benchmarking has been critical in making progress in the field of data, as it has provided a crucial mechanism to accelerate the progress in the data community. Early bench-marks have been responsible for spurring innovation and serving as a quantitative way to get past marketing salvos. Prime examples of this observation are the Anon et al. benchmark and the Wisconsin benchmark that spurred rapid advances in database transaction and analytic query processing. These dual technologies are now crucial to running our "digital planet" today. However, data benchmarking has changed considerably over the past four decades. In the early days, pioneers like Jim Gray and David DeWitt, were crucial in creating benchmarks that were genuinely designed to move the community forward. Back then the data industry was in its "Wild West" days. A few good-meaning cowboys is all that it took to set the industry in the right direction. Sadly, those halcyon days are long gone. The digital planet is simply too dependent on data. In fact, as has been noted before, data in the new currency. Thus, there are deeply-vested interests in modern benchmarks that simply do not achieve the goals that benchmarks claim to achieve. To address these issues, this article proposes a radical rethinking of data benchmarks. This article makes three concrete suggestions: First, data benchmarks should have no optional components, forcing the vendors to make "hard" choices when reporting benchmark results (e.g. reporting on energy consumption in TPC benchmarks, and reporting results on newer benchmarks that subsume older ones). Second, benchmarking in the cloud-era implies that each customer will have their own measures that are important to them. Thus, a service that offers automated bench-marking (and associated tuning) of customer workload in the cloud is far more important than actual benchmarks. Finally, we should dramatically rethink how our benchmark councils (including TPC) work. We should reverse the stewardship of these councils by replacing vendors from the council by the actual customers of data products, and let customers directly drive the definition of new benchmarks.
机译:基准测试对于在数据领域取得进展至关重要,因为它提供了加速数据社区进展的重要机制。早期的长凳标记一直负责刺激创新,并作为超越营销萨尔沃斯的定量方式。这种观察的素数是Anon等人。基准和威斯康星州基准测试数据库事务和分析查询处理中的快速进步。这些双技术表现在今天运行“数字星球”至关重要。然而,在过去的四十年里,数据基准测试发生了很大变化。在早期,像Jim Grey和David dewitt这样的先驱者在创造基准方面对真正旨在使社区前进的基准至关重要。然后数据行业在其“狂野的西部”天。一些好意义的牛仔队是将行业朝着正确的方向设置。可悲的是,那些哈西顿日已经逝去了。数字行星简直依赖于数据。实际上,正如之前所说的那样,数据以新的货币。因此,在现代基准中存在深切既得兴趣,只需达到基准索赔的目标即可实现目标。为了解决这些问题,本文提出了一种自由派重新思考数据基准。本文提出了三个具体的建议:首先,数据基准应该没有可选的组件,强迫供应商在报告基准结果时制作“硬”选择(例如,报告TPC基准测试中的能耗,并报告越旧的基准的结果)。其次,云时代的基准暗示每个客户都有自己的措施对他们很重要。因此,提供云中的客户工作负载的自动工作台标记(和相关调谐)的服务比实际基准更重要。最后,我们应该大大重新思考我们的基准委员会(包括TPC)工作。我们应该通过数据产品的实际客户取代理事会的供应商来扭转这些议会的管理,并让客户直接驱动新基准的定义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号