首页> 外文期刊>Knowledge and information systems >High average-utility sequential pattern mining based on uncertain databases
【24h】

High average-utility sequential pattern mining based on uncertain databases

机译:基于不确定数据库的高平均水性序列模式挖掘

获取原文
获取原文并翻译 | 示例
           

摘要

The emergence and proliferation of the internet of things (IoT) devices have resulted in the generation of big and uncertain data due to the varied accuracy and decay of sensors and their different sensitivity ranges. Since data uncertainty plays an important role in IoT data, mining the useful information from uncertain dataset has become an important issue in recent decades. Past works focus on mining the high sequential patterns from the uncertain database. However, the utility of a derived sequence increases along with the size of the sequence, which is an unfair measure to evaluate the utility of a sequence since any combination of a high-utility sequence will also be the high-utility sequence, even though the utility of a sequence is merely low. In this paper, we address the limitation of the previous potential high-utility sequential pattern mining and present a potentially high average-utility sequential pattern mining framework for discovering the set of potentially high average-utility sequential patterns (PHAUSPs) from the uncertain dataset by considering the size of a sequence, which can provide a fair measure of the patterns than the previous works. First, a baseline potentially high average-utility sequential pattern algorithm and three pruning strategies are introduced to completely mine the set of the desired PHAUSPs. To reduce the computational cost and accelerate the mining process, a projection algorithm called PHAUP is then designed, which leads to a reduction in the size of candidates of the desired patterns. Several experiments in terms of runtime, number of candidates, memory overhead, number of discovered pattern, and scalability are then evaluated on both real-life and artificial datasets, and the results showed that the proposed algorithm achieves promising performance, especially the PHAUP approach.
机译:由于传感器的多种精度和衰减和其不同的灵敏度范围,事物互联网(物联网)设备的出现和增殖导致产生大而不确定的数据。由于数据不确定性在物联网数据中发挥着重要作用,因此近几十年来挖掘不确定数据集的有用信息已成为一个重要问题。过去的工作侧重于从不确定数据库中挖掘高顺序模式。然而,衍生序列的效用随着序列的尺寸而增加,这是评估序列的效用的不公平措施,因为高效序列的任何组合也是高效用序列,即使是序列的效用仅仅是低的。在本文中,我们解决了先前潜在的高效顺序模式挖掘的限制,并提出了一种潜在的高平均水性连续模式挖掘,用于发现来自不确定数据集的潜在高平均水性连续模式(PHAUSPS)的集合考虑到序列的大小,这可以提供比以前的作品的公平衡量标准。首先,引入了基线潜在的高平均水性序列模式算法和三种修剪策略,以完全挖掘所需的Phausps的一套。为了降低计算成本并加速采矿过程,然后设计一种称为PHAUP的投影算法,这导致所需图案的候选尺寸的减小。在运行时的几个实验,候选者的数量,记忆开销,发现模式的数量和可扩展性,并且结果表明该算法实现了有希望的性能,尤其是Phaup方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号