首页> 外文期刊>ACM transactions on database systems >Mining Order-Preserving Submatrices from Probabilistic Matrices
【24h】

Mining Order-Preserving Submatrices from Probabilistic Matrices

机译:从概率矩阵中挖掘保留顺序的子矩阵

获取原文
获取原文并翻译 | 示例
           

摘要

Order-preserving submatrices (OPSMs) capture consensus trends over columns shared by rows in a data matrix. Mining OPSM patterns discovers important and interesting local correlations in many real applications, such as those involving biological data or sensor data. The prevalence of uncertain data in various applications, however, poses new challenges for OPSM mining, since data uncertainty must be incorporated into OPSM modeling and the algorithmic aspects. In this article, we define new probabilistic matrix representations to model uncertain data with continuous distributions. A novel probabilistic order-preserving submatrix (POPSM) model is formalized in order to capture similar local correlations in probabilistic matrices. The POPSM model adopts a new probabilistic support measure that evaluates the extent to which a row belongs to a POPSM pattern. Due to the intrinsic high computational complexity of the POPSM mining problem, we utilize the anti-monotonic property of the probabilistic support measure and propose an efficient Apriori-based mining framework called ProbApri to mine POPSM patterns. The framework consists of two mining methods, UniApri and NormApri, which are developed for mining POPSM patterns, respectively, from two representative types of probabilistic matrices, the UniDist matrix (assuming uniform data distributions) and the NormDist matrix (assuming normal data distributions). We show that the NormApri method is practical enough for mining POPSM patterns from probabilistic matrices that model more general data distributions. We demonstrate the superiority of our approach by two applications. First, we use two biological datasets to illustrate that the POPSM model better captures the characteristics of the expression levels of biologically correlated genes and greatly promotes the discovery of patterns with high biological significance. Our result is significantly better than the counterpart OPSMRM (OPSM with repeated measurement) model which adopts a set-valued matrix representation to capture data uncertainty. Second, we run the experiments on an RFID trace dataset and show that our POPSM model is effective and efficient in capturing the common visiting subroutes among users.
机译:保留顺序子矩阵(OPSM)捕获数据矩阵中行共享的列上的共识趋势。挖掘OPSM模式可在许多实际应用中发现重要且有趣的局部关联,例如涉及生物数据或传感器数据的那些关联。但是,由于必须将数据不确定性纳入OPSM建模和算法方面,因此在各种应用程序中不确定性数据的普及给OPSM挖掘提出了新的挑战。在本文中,我们定义了新的概率矩阵表示形式,以对具有连续分布的不确定数据进行建模。为了捕获概率矩阵中的相似局部相关性,将一种新颖的概率顺序保留子矩阵(POPSM)模型形式化。 POPSM模型采用了一种新的概率支持度量,该度量可以评估行属于POPSM模式的程度。由于POPSM挖掘问题固有的高计算复杂性,因此我们利用了概率支持措施的反单调性质,并提出了一种有效的基于Apriori的挖掘框架ProbApri来挖掘POPSM模式。该框架由两种挖掘方法UniApri和NormApri组成,分别针对两种概率类型的概率矩阵UniDist矩阵(假设数据分布均匀)和NormDist矩阵(假设正态数据分布)而开发,用于挖掘POPSM模式。我们表明,NormApri方法足够实用,可以从概率矩阵中挖掘POPSM模式,该概率模型对更通用的数据分布进行建模。我们通过两个应用程序证明了我们方法的优越性。首先,我们使用两个生物学数据集来说明POPSM模型更好地捕捉了生物学相关基因表达水平的特征,并极大地促进了具有高生物学意义的模式的发现。我们的结果明显好于对应的OPSMRM(带有重复测量的OPSM)模型,后者采用集值矩阵表示来捕获数据不确定性。其次,我们在RFID跟踪数据集上进行了实验,结果表明我们的POPSM模型在捕获用户之间常见的访问子路线方面是有效且高效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号