...
首页> 外文期刊>World Wide Web >POLYTOPE: a flexible sampling system for answering exploratory queries
【24h】

POLYTOPE: a flexible sampling system for answering exploratory queries

机译:POLYTOPE:灵活的抽样系统,用于回答探索性查询

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Data exploration task is usually quite time-consuming. Analysts who want to find interests or verify their hypothesis may prefer a lower response time while tolerating a bounded error. Approximate query processing (AQP) is a convincing way to achieve this goal by leveraging some pre-computed samples to speed up this process. Existing sampling based AQP systems usually take a single sampling strategy on the whole dataset. However, during the data exploration tasks, various potential interests may distribute in different parts of dataset. To explore these interests, queries submitted by users thus show a rich diversity for separate sub-datasets. Therefore, only one single sampling strategy is obviously not competent for all queries accessing various sub-datasets. In this paper, we proposed a flexible and effective sampling system POLYTOPE especially designed for the data exploration tasks. To achieve this, we take the following three key ideas: (1) split the dataset into sampling blocks according to the user query patterns, (2) individually generate a set of optimized samples for each sampling block, and (3) automatically select an optimal sample at run time. We utilize both user query patterns and underlying data distribution to fulfill these ideas. We have implemented our system on the Spark platform and our comprehensive experimental results show that our system improved the accuracy performance up to 46% under the same time constraint for the data exploration tasks.
机译:数据探索任务通常非常耗时。想要发现兴趣或验证其假设的分析师可能更愿意在容忍有限错误的同时缩短响应时间。近似查询处理(AQP)是通过利用一些预先计算的样本来加速此过程的一种令人信服的方法。现有的基于采样的AQP系统通常对整个数据集采用单一采样策略。但是,在数据探索任务期间,各种潜在兴趣可能会分布在数据集的不同部分。为了探索这些兴趣,用户提交的查询因此对单独的子数据集显示了丰富的多样性。因此,显然只有一种采样策略不能胜任访问各种子数据集的所有查询。在本文中,我们提出了一种灵活且有效的采样系统POLYTOPE,该系统专门为数据探索任务而设计。为此,我们采取以下三个主要思想:(1)根据用户查询模式将数据集分为多个采样块;(2)为每个采样块分别生成一组优化的样本;(3)自动选择一个在运行时获得最佳样本。我们利用用户查询模式和基础数据分发来实现这些想法。我们已经在Spark平台上实现了我们的系统,综合实验结果表明,在数据探索任务的相同时间约束下,我们的系统将准确性提高了46%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号