【24h】

Enhancing Data Generation in TPCx-HS with a Non-uniform Random Distribution

机译:通过非均匀随机分布增强TPCx-HS中的数据生成

获取原文

摘要

Developed by the Transaction Processing Performance Council, the TPC Express Benchmark™ HS (TPCx-HS) is the industry's first standard for benchmarking big data systems. It is designed to provide an objective measure of hardware, operating system and commercial Apache Hadoop File System API compatible software distributions, and to provide the industry with verifiable performance, price-performance and availability metrics [1, 2]. It can be used to compare a broad range of system topologies and implementation methodologies of big data systems in a technically rigorous and directly comparable and vendor-neutral manner. The modeled application is simple and the results are highly relevant to hardware and software dealing with Big Data systems in general. The data generation is derived from TeraGen [3] which uses uniform distribution of data. In this paper the authors propose normal distribution (Gaussian distribution) which may be more representative of real life datasets. The modified TeraGen and complete changes required to the TPCx-HS kit are included as part of this paper.
机译:TPC Express Benchmark™HS(TPCx-HS)由交易处理性能委员会开发,是业界第一个对大数据系统进行基准测试的标准。它旨在提供一种客观的衡量硬件,操作系统和与Apache Hadoop File System API兼容的商业软件发行版本的方法,并为业界提供可验证的性能,价格性能和可用性指标[1、2]。它可用于以技术严格,直接可比且与供应商无关的方式比较大数据系统的各种系统拓扑和实现方法。建模的应用程序很简单,其结果通常与处理大数据系统的硬件和软件高度相关。数据生成源自TeraGen [3],后者使用数据的均匀分布。在本文中,作者提出了正态分布(高斯分布),它可能更能代表现实生活中的数据集。修改后的TeraGen以及对TPCx-HS套件所需的完整更改都包含在本文中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号