首页> 外文OA文献 >BlinkDB: queries with bounded errors and bounded response times on very large data

【2h】

BlinkDB: queries with bounded errors and bounded response times on very large data

机译：BlinkDB：对非常大的数据进行有界错误和有限响应时间的查询

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we present BlinkDB, a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. To achieve this, BlinkDB uses two key ideas: (1) an adaptive optimization framework that builds and maintains a set of multi-dimensional stratified samples from original data over time, and (2) a dynamic sample selection strategy that selects an appropriately sized sample based on a query's accuracy or response time requirements. We evaluate BlinkDB against the well-known TPC-H benchmarks and a real-world analytic workload derived from Conviva Inc., a company that manages video distribution over the Internet. Our experiments on a 100 node cluster show that BlinkDB can answer queries on up to 17 TBs of data in less than 2 seconds (over 200 x faster than Hive), within an error of 2-10%.

机译：在本文中，我们介绍了BlinkDB，这是一种大规模并行的近似查询引擎，用于对大量数据运行交互式SQL查询。 BlinkDB允许用户权衡查询准确度和响应时间，通过在数据样本上运行查询并显示带有有意义的错误条注释的结果，从而对海量数据进行交互式查询。为了实现这一目标，BlinkDB使用了两个关键思想：（1）一种自适应优化框架，该框架可以随着时间的推移从原始数据中构建并维护一组多维分层样本，以及（2）一种动态样本选择策略，可以选择适当大小的样本根据查询的准确性或响应时间要求。我们根据著名的TPC-H基准和来自Conviva Inc.（一家管理Internet上的视频的公司）得出的实际分析工作负载评估BlinkDB。我们在100个节点的群集上进行的实验表明，BlinkDB可以在不到2秒的时间内回答高达17 TB数据的查询（比Hive快200倍），误差在2-10％之内。

著录项

作者
Agarwal Sameer; Mozafari Barzan; Panda Aurojit; Milner Henry; Stoica Ion; Madden Samuel R.;
展开▼
作者单位

展开▼
年度 2013
总页数
原文格式 PDF
正文语种 en_US
中图分类

相似文献

外文文献
中文文献
专利

1. BlinkDB: queries with bounded errors and bounded response times on very large data [J] . Mohamed Eltabakh Computing reviews . 2014,第3期

机译：BlinkDB：对非常大的数据具有有限错误和有限响应时间的查询
2. Parameter bounds for discrete-time Hammerstein models with bounded output errors [J] . Cerone V., Regruto D. IEEE Transactions on Automatic Control . 2003,第10期

机译：具有有限输出误差的离散时间Hammerstein模型的参数范围
3. Bounded similarity querying for time-series data [J] . Goldin DQ, Millstein TD, Kutlu A Information and computation . 2004,第2期

机译：有界相似性查询时间序列数据
4. Approximate Query Processing Using Wavelets in OLAP with Arbitrarily Sized Data and Bounded Errors [C] . A. Ukharov, A. Burdakov, U. Grigorev, Euromicro International Conference on Parallel, Distributed, and Network-Based Processing . 2016

机译：使用任意大小的数据和有界错误的OLAP中的小波进行近似查询处理
5. Queries with Bounded Errors & Bounded Response Times on Very Large Data. [D] . Agarwal, Sameer. 2014

机译：对非常大的数据具有有限错误和有限响应时间的查询。
6. Quadrant-Based Minimum Bounding Rectangle-Tree Indexing Method for Similarity Queries over Big Spatial Data in HBase [O] . Bumjoon Jo, Sungwon Jung 2018

机译：HBase中大空间数据相似性查询的基于象限的最小边界矩形树索引方法
7. BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data [O] . Agarwal, Sameer, Panda, Aurojit, Mozafari, Barzan, 2012

机译：BlinkDB：有限错误的查询和有限的响应时间大数据

BlinkDB: queries with bounded errors and bounded response times on very large data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅