【24h】

Volatile Correlation Computation: A Checkpoint View

机译:易失性相关计算:检查点视图

获取原文

摘要

Recent years have witnessed increased interest in computing strongly correlated pairs in very large databases. Most previous studies have been focused on static data sets. However, in real-world applications, input data are often dynamic and must continually be updated. With such large and growing data sets, new research efforts are expected to develop an incremental solution for correlation computing. Along this line, in this paper, we propose a CHECK-POINT algorithm that can efficiently incorporate new transactions for correlation computing as they become available. Specifically, we set a checkpoint to establish a computation buffer, which can help us determine an upper bound for the correlation. This checkpoint bound can be exploited to identify a list of candidate pairs, which will be maintained and computed for correlations as new transactions are added into the database. However, if the total number of new transactions is beyond the buffer size, a new upper bound is computed by the new checkpoint and a new list of candidate pairs is identified. Experimental results on real-world data sets show that CHECK-POINT can significantly reduce the correlation computing cost in dynamic data sets and has the advantage of compacting the use of memory space.
机译:近年来,目睹了对在超大型数据库中计算强相关对的兴趣增加。以前的大多数研究都集中在静态数据集上。但是,在实际应用中,输入数据通常是动态的,必须不断进行更新。随着如此庞大且不断增长的数据集,新的研究工作有望为相关计算开发增量解决方案。沿着这条线,在本文中,我们提出了一种CHECK-POINT算法,该算法可以在新交易可用时有效地合并新交易进行相关计算。具体来说,我们设置一个检查点来建立计算缓冲区,这可以帮助我们确定相关性的上限。可以利用此检查点范围来识别候选对列表,当新事务添加到数据库中时,将对这些候选对进行维护和计算相关性。但是,如果新事务的总数超出缓冲区大小,则新的检查点将计算新的上限,并标识新的候选对列表。在实际数据集上的实验结果表明,CHECK-POINT可以显着降低动态数据集中的相关计算成本,并且具有压缩内存空间使用的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号