首页> 外文会议>IEEE international conference on data engineering >Complete discovery of high-quality patterns in large numerical tensors
【24h】

Complete discovery of high-quality patterns in large numerical tensors

机译:完全发现大型数值张量中的高质量模式

获取原文
获取外文期刊封面目录资料

摘要

Many datasets are numerical tensors, i. e., associate n-tuples with numerical values. Until recently, the discovery of relevant local patterns in such numerical and multidimensional data has received little attention despite the broad applicative perspectives offered by this general framework. Even in the simpler 2-dimensional case, almost every proposal so far is either incomplete (i. e., it does not list every pattern) or relies on binning and mines Boolean tensors. In both cases, some information is lost during the process. In uncertain tensors, n-tuples satisfy the studied predicate to a certain extent and no information is lost w.r.t. the original data. Given an uncertain tensor, the closed patterns are its maximal “sub-tensors” covering n-tuples that “mostly” satisfy the predicate. Defining “mostly” is the key problem: the patterns should be both relevant given the data and efficiently extractable. The proposed complete extractor reuses the enumeration principles of the state-of-the-art miner for closed n-sets but incrementally enforces the newly designed definition. In this way, the proposed algorithm runs orders of magnitude faster than its only competitor and large datasets are tractable. The experimental section reports the discovery of dynamic patterns of influence in Twitter as well as usage patterns in a transportation network. Additional experiments on synthetic data quantitatively assess the quality of the chosen definition for the patterns.
机译:许多数据集都是数值张量,即。例如,将n元组与数值相关联。直到最近,尽管这种通用框架提供了广泛的应用前景,但在这种数值和多维数据中发现相关局部模式的关注仍很少。即使在更简单的二维情况下,到目前为止,几乎每一个提议要么是不完整的(即,它没有列出每个模式),要么依赖于装箱并挖掘布尔张量。在这两种情况下,过程中都会丢失一些信息。在不确定的张量中,n元组在一定程度上满足了所研究的谓词,并且不会丢失任何信息。原始数据。给定不确定的张量,闭合模式是其最大的“子张量”,涵盖“大部分”满足谓词的n个元组。定义“主要”是关键问题:模式既要与数据相关,又要有效提取。拟议的完整提取器将最新矿机的枚举原理重用于封闭的n集,但逐步执行了新设计的定义。通过这种方式,所提出的算法比其唯一的竞争对手运行速度快了几个数量级,并且大型数据集易于处理。实验部分报告了在Twitter中影响力的动态模式以及交通网络中使用模式的发现。在合成数据上进行的其他实验定量地评估了所选图案定义的质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号