首页> 外文会议>International conference on knowledge science, engineering and management >Agile Query Processing in Statistical Databases: A Process-In-Memory Approach
【24h】

Agile Query Processing in Statistical Databases: A Process-In-Memory Approach

机译:统计数据库中的敏捷查询处理:一种内存中处理方法

获取原文

摘要

Statistical database systems are designed to answer queries on summarized data (or macro data), while queries on raw records are not allowed in such database systems. As macro data can offer aggregate information about the database, it is also an effective way to use statistical queries to provide analytical results in semantic databases. However, traditional statistical databases are proposed for security protection, i.e., hiding the raw records from user queries. Few studies are toward query optimizations on aggregate queries in statistical databases. In this paper, we propose a new process-in-memory (PIM) based processing scheme called agile query for accelerating queries in statistical databases. We present two new designs in the agile query. First, we propose an in-memory index to cache aggregate operators (e.g., sum, min, max, count, and average) in the main memory. The aggregate queries that hit in the in-memory index can be evaluated in the memory and no I/O operation will be incurred. Second, we propose to incrementally update the in-memory operator index so that we can ensure the consistency between the cached data and the original data records. We implement the agile query processing framework on top of MySQL and conduct experiments over various sizes of datasets to compare our design with the traditional method in MySQL. The results show that our proposal achieves up to 9 times higher throughput than MySQL under the skewed Zipf query set, and averagely gets about 2 times higher throughput under the random and uniform distributed queries.
机译:统计数据库系统旨在回答对汇总数据(或宏数据)的查询,而在此类数据库系统中不允许对原始记录进行查询。由于宏数据可以提供有关数据库的汇总信息,因此它也是使用统计查询在语义数据库中提供分析结果的有效方法。但是,提出了用于安全保护的传统统计数据库,即,对用户查询隐藏原始记录。很少有研究针对统计数据库中聚合查询的查询优化。在本文中,我们提出了一种新的基于内存处理(PIM)的处理方案,称为敏捷查询,用于加速统计数据库中的查询。我们在敏捷查询中提出了两种新设计。首先,我们提出了一个内存索引,以将聚合运算符(例如,总和,最小值,最大值,计数和平均值)缓存在主内存中。可以在内存中评估在内存索引中命中的聚合查询,并且不会发生I / O操作。其次,我们建议增量更新内存中的运算符索引,以便我们可以确保缓存的数据与原始数据记录之间的一致性。我们在MySQL之上实现了敏捷查询处理框架,并对各种大小的数据集进行了实验,以将我们的设计与MySQL中的传统方法进行比较。结果表明,在偏斜的Zipf查询集下,我们的建议实现的吞吐量比MySQL高出9倍,而在随机且均匀的分布式查询下,吞吐量平均提高了约2倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号