首页> 外文OA文献 >BlinkDB: queries with bounded errors and bounded response times on very large data
【2h】

BlinkDB: queries with bounded errors and bounded response times on very large data

机译:BlinkDB:​​对非常大的数据进行有界错误和有限响应时间的查询

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper, we present BlinkDB, a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. To achieve this, BlinkDB uses two key ideas: (1) an adaptive optimization framework that builds and maintains a set of multi-dimensional stratified samples from original data over time, and (2) a dynamic sample selection strategy that selects an appropriately sized sample based on a query's accuracy or response time requirements. We evaluate BlinkDB against the well-known TPC-H benchmarks and a real-world analytic workload derived from Conviva Inc., a company that manages video distribution over the Internet. Our experiments on a 100 node cluster show that BlinkDB can answer queries on up to 17 TBs of data in less than 2 seconds (over 200 x faster than Hive), within an error of 2-10%.
机译:在本文中,我们介绍了BlinkDB,这是一种大规模并行的近似查询引擎,用于对大量数据运行交互式SQL查询。 BlinkDB允许用户权衡查询准确度和响应时间,通过在数据样本上运行查询并显示带有有意义的错误条注释的结果,从而对海量数据进行交互式查询。为了实现这一目标,BlinkDB使用了两个关键思想:(1)一种自适应优化框架,该框架可以随着时间的推移从原始数据中构建并维护一组多维分层样本,以及(2)一种动态样本选择策略,可以选择适当大小的样本根据查询的准确性或响应时间要求。我们根据著名的TPC-H基准和来自Conviva Inc.(一家管理Internet上的视频的公司)得出的实际分析工作负载评估BlinkDB。我们在100个节点的群集上进行的实验表明,BlinkDB可以在不到2秒的时间内回答高达17 TB数据的查询(比Hive快200倍),误差在2-10%之内。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号