【24h】

Big Data Normalization for Massively Parallel Processing Databases

机译:大规模并行处理数据库的大数据规范化

获取原文

摘要

High performance querying and ad-hoc querying are commonly viewed as mutually exclusive goals in massively parallel processing databases. In the one extreme, a database can be set up to provide the results of a single known query so that the use of available of resources are maximized and response time minimized, but at the cost of all other queries being suboptimally executed. In the other extreme, when no query is known in advance, the database must provide the information without such optimization, normally resulting in inefficient execution of all queries. This paper introduces a novel technique, highly normalized Big Data using Anchor modeling, that provides a very efficient way to store information and utilize resources, thereby providing ad-hoc querying with high performance for the first time in massively parallel processing databases. A case study of how this approach is used for a Data Warehouse at Avito over two years time, with estimates for and results of real data experiments carried out in HP Vertica, an MPP RDBMS, are also presented.
机译:在大规模并行处理数据库中,高性能查询和即席查询通常被视为互斥的目标。在一个极端情况下,可以建立一个数据库来提供单个已知查询的结果,以便最大程度地利用资源并减少响应时间,但要以次优地执行所有其他查询为代价。在另一种极端情况下,如果事先不知道任何查询,则数据库必须在不进行此类优化的情况下提供信息,这通常会导致所有查询的执行效率低下。本文介绍了一种新颖的技术,即使用锚定模型进行高度归一化的大数据,它提供了一种非常有效的方式来存储信息和利用资源,从而首次在大规模并行处理数据库中提供了高性能的即席查询。还提供了一个案例研究,说明了如何在两年的时间内将这种方法用于Avito的数据仓库,以及在HP Vertica(MPP RDBMS)中进行的实际数据实验的估计和结果。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号