【24h】

Introducing TPCx-HS: The First Industry Standard for Benchmarking Big Data Systems

机译:引入TPCx-HS:基准化大数据系统的第一个行业标准

获取原文

摘要

The designation Big Data has become a mainstream buzz phrase across many industries as well as research circles. Today many companies are making performance claims that are not easily verifiable and comparable in the absence of a neutral industry benchmark. Instead one of the test suites used to compare performance of Hadoop based Big Data systems is the TeraSort. While it nicely defines the data set and tasks to measure Big Data Hadoop systems it lacks a formal specification and enforcement rules that enable the comparison of results across systems. In this paper we introduce TPCx-HS, the industry's first industry standard benchmark, designed to stress both hardware and software that is based on Apache HDFS API compatible distributions. TPCx-HS extends the workload defined in TeraSort with formal rules for implementation, execution, metric, result verification, publication and pricing. It can be used to asses a broad range of system topologies and implementation methodologies of Big Data Hadoop systems in a technically rigorous and directly comparable and vendor-neutral manner.
机译:大数据这一名称已成为许多行业和研究界的主流流行语。如今,在缺乏中立的行业基准的情况下,许多公司提出的绩效要求不容易被验证和可比。相反,用于比较基于Hadoop的大数据系统性能的测试套件之一是TeraSort。尽管它很好地定义了用于测量大数据Hadoop系统的数据集和任务,但它缺乏正式的规范和实施规则,无法对系统之间的结果进行比较。在本文中,我们介绍了TPCx-HS,这是业界第一个行业标准基准,旨在强调基于Apache HDFS API兼容发行版的硬件和软件。 TPCx-HS通过用于实施,执行,度量,结果验证,发布和定价的正式规则扩展了TeraSort中定义的工作量。它可用于以技术严格,直接可比且与供应商无关的方式评估大数据Hadoop系统的各种系统拓扑和实现方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号