首页> 外文会议>Annual Conference on Learning Theory(COLT 2006); 20060622-25; Pittsburgh,PA(US) >Significance and Recovery of Block Structures in Binary Matrices with Noise
【24h】

Significance and Recovery of Block Structures in Binary Matrices with Noise

机译:具有噪声的二元矩阵中块结构的意义和恢复

获取原文
获取原文并翻译 | 示例

摘要

Frequent itemset mining (FIM) is one of the core problems in the field of Data Mining and occupies a central place in its literature. One equivalent form of FIM can be stated as follows: given a rectangular data matrix with binary entries, find every submatrix of 1s having a minimum number of columns. This paper presents a theoretical analysis of several statistical questions related to this problem when noise is present. We begin by establishing several results concerning the extremal behavior of submatrices of ones in a binary matrix with random entries. These results provide simple significance bounds for the output of FIM algorithms. We then consider the noise sensitivity of FIM algorithms under a simple binary additive noise model, and show that, even at small noise levels, large blocks of 1s leave behind fragments of only logarithmic size. Thus such blocks cannot be directly recovered by FIM algorithms, which search for submatrices of all 1s. On the positive side, we show how, in the presence of noise, an error-tolerant criterion can recover a square submatrix of 1s against a background of 0s, even when the size of the target submatrix is very small.
机译:频繁项集挖掘(FIM)是数据挖掘领域的核心问题之一,在其文献中占据中心位置。 FIM的一种等效形式可以描述如下:给定具有二进制条目的矩形数据矩阵,找到具有最少列数的每个1s子矩阵。本文提出了在存在噪声时与此问题相关的几个统计问题的理论分析。我们首先建立几个关于具有随机项的二进制矩阵中子矩阵的极值行为的结果。这些结果为FIM算法的输出提供了简单的有效范围。然后,我们在简单的二进制加性噪声模型下考虑FIM算法的噪声敏感性,并表明,即使在较小的噪声水平下,大的1s块也会留下仅对数大小的片段。因此,此类块无法通过搜索所有1的子矩阵的FIM算法直接恢复。从积极的方面,我们展示了在存在噪声的情况下,即使目标子矩阵的尺寸很小,容错标准也可以在0s的背景下恢复1s的正方形子矩阵。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号