首页> 外文会议>CMG imPACt >Efficient Synthetic Data Generator for structured Data
【24h】

Efficient Synthetic Data Generator for structured Data

机译:用于结构化数据的高效合成数据生成器

获取原文
获取外文期刊封面目录资料

摘要

Data have become a torrent flowing into every area of the global economy. Data sizes grow beyond zettabyte barrier. These facts make it obvious that, the applications catering this growing data volume must be well capable to handle large amount of data efficiently. Application performance degrades with increasing data size. Break down of applications due to inadequately handling of huge data beyond its capacity may convert into loss of money, time and business reputation. Application performance testing on large data sizes is very necessary to avoid such catastrophic outcomes. Applications must be tested for growing data size to ensure SLA (Service Level Agreement). Testing such, applications need similar kind of data as of in real life and very large in volume. The solution to this problem is 'efficient' generation of customized test data. Efficient means generating data as fast as possible maximally utilizing underlying resources. In this paper, the novel approach of Data Generator has been proposed, which generates large amount of testing data most efficiently. The efficiency of this generator is comparatively more than any other data generator due to its capability of utilizing all underlying resources (CPU, disk, memory) maximally. Maintaining dynamic queue size of every resource, finding most optimal size of buffer on the fly and generation and spawning of optimal number of threads running in parallel are few of the key techniques materialized in making data generator most efficient. AS case study, we have compared the performance of the proposed data generator with TPC-H benchmark data generator, 'dbgen' and it is shown that the proposed data generator is up to 8 times faster than 'dbgen' on same hardware taken for case study.
机译:数据已成为流入全球经济的每个领域的洪流。数据大小扩大到ZettByte屏障之外。这些事实明显使得迎合这种日益增长的数据量的应用程序能够有效地处理大量数据。应用性能随着数据大小的增加而劣化。由于在超越其容量之外处理巨大数据而损失可能会转化为损失金钱,时间和商业声誉的潜在申请。在大数据规模上的应用性能测试是非常有必要的,以避免这种灾难性结果。必须测试应用程序以增加数据大小以确保SLA(服务级别协议)。测试如,应用需要在现实生活中的类似数据和体积非常大。解决此问题的解决方案是“高效”生成定制的测试数据。高效意味着尽可能快地利用底层资源生成数据。在本文中,提出了数据发生器的新方法,其最有效地产生了大量的测试数据。由于其利用所有基础资源(CPU,磁盘,存储器)最大限度地,该发生器的效率比任何其他数据发生器相对多。维护每个资源的动态队列大小,在飞行中找到最佳缓冲区的最佳尺寸,并在并行运行的最佳线程的生成和产卵中是在使数据发生器最有效的基础上实现的关键技术的少数。如案例研究,我们已经将建议的数据发生器与TPC-H基准数据发生器的性能进行了比较,“DBGEN”并显示该数据发生器比案例所采取的相同硬件上的“DBGEN”快8倍。学习。

著录项

  • 来源
    《CMG imPACt》|2016年|993-1002|共10页
  • 会议地点
  • 作者

    Chetan Phalak;

  • 作者单位
  • 会议组织
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号