Mining Order-Preserving Submatrices from Probabilistic Matrices

QIONG FANG; WILFRED NG; JIANLIN FENG; YULIANG LI

首页> 外文期刊>ACM transactions on database systems >Mining Order-Preserving Submatrices from Probabilistic Matrices

【24h】

Mining Order-Preserving Submatrices from Probabilistic Matrices

机译：从概率矩阵中挖掘保留顺序的子矩阵

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Order-preserving submatrices (OPSMs) capture consensus trends over columns shared by rows in a data matrix. Mining OPSM patterns discovers important and interesting local correlations in many real applications, such as those involving biological data or sensor data. The prevalence of uncertain data in various applications, however, poses new challenges for OPSM mining, since data uncertainty must be incorporated into OPSM modeling and the algorithmic aspects. In this article, we define new probabilistic matrix representations to model uncertain data with continuous distributions. A novel probabilistic order-preserving submatrix (POPSM) model is formalized in order to capture similar local correlations in probabilistic matrices. The POPSM model adopts a new probabilistic support measure that evaluates the extent to which a row belongs to a POPSM pattern. Due to the intrinsic high computational complexity of the POPSM mining problem, we utilize the anti-monotonic property of the probabilistic support measure and propose an efficient Apriori-based mining framework called ProbApri to mine POPSM patterns. The framework consists of two mining methods, UniApri and NormApri, which are developed for mining POPSM patterns, respectively, from two representative types of probabilistic matrices, the UniDist matrix (assuming uniform data distributions) and the NormDist matrix (assuming normal data distributions). We show that the NormApri method is practical enough for mining POPSM patterns from probabilistic matrices that model more general data distributions. We demonstrate the superiority of our approach by two applications. First, we use two biological datasets to illustrate that the POPSM model better captures the characteristics of the expression levels of biologically correlated genes and greatly promotes the discovery of patterns with high biological significance. Our result is significantly better than the counterpart OPSMRM (OPSM with repeated measurement) model which adopts a set-valued matrix representation to capture data uncertainty. Second, we run the experiments on an RFID trace dataset and show that our POPSM model is effective and efficient in capturing the common visiting subroutes among users.

机译：保留顺序子矩阵（OPSM）捕获数据矩阵中行共享的列上的共识趋势。挖掘OPSM模式可在许多实际应用中发现重要且有趣的局部关联，例如涉及生物数据或传感器数据的那些关联。但是，由于必须将数据不确定性纳入OPSM建模和算法方面，因此在各种应用程序中不确定性数据的普及给OPSM挖掘提出了新的挑战。在本文中，我们定义了新的概率矩阵表示形式，以对具有连续分布的不确定数据进行建模。为了捕获概率矩阵中的相似局部相关性，将一种新颖的概率顺序保留子矩阵（POPSM）模型形式化。 POPSM模型采用了一种新的概率支持度量，该度量可以评估行属于POPSM模式的程度。由于POPSM挖掘问题固有的高计算复杂性，因此我们利用了概率支持措施的反单调性质，并提出了一种有效的基于Apriori的挖掘框架ProbApri来挖掘POPSM模式。该框架由两种挖掘方法UniApri和NormApri组成，分别针对两种概率类型的概率矩阵UniDist矩阵（假设数据分布均匀）和NormDist矩阵（假设正态数据分布）而开发，用于挖掘POPSM模式。我们表明，NormApri方法足够实用，可以从概率矩阵中挖掘POPSM模式，该概率模型对更通用的数据分布进行建模。我们通过两个应用程序证明了我们方法的优越性。首先，我们使用两个生物学数据集来说明POPSM模型更好地捕捉了生物学相关基因表达水平的特征，并极大地促进了具有高生物学意义的模式的发现。我们的结果明显好于对应的OPSMRM（带有重复测量的OPSM）模型，后者采用集值矩阵表示来捕获数据不确定性。其次，我们在RFID跟踪数据集上进行了实验，结果表明我们的POPSM模型在捕获用户之间常见的访问子路线方面是有效且高效的。

著录项

来源
《ACM transactions on database systems》 |2014年第1期|6.1-6.43|共43页
作者
QIONG FANG; WILFRED NG; JIANLIN FENG; YULIANG LI;
展开▼
作者单位

Computer Science and Engineering Department, Hong Kong University of Science and Technology;

Computer Science and Engineering Department, Hong Kong University of Science and Technology;

School of Software, Sun Yat-Sen University;

Computer Science and Engineering Department, University of California, San Diego;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Order-preserving submatrices; probabilistic matrices; probabilistic support; OPSM mining;

机译：保序子矩阵;概率矩阵概率支持;OPSM采矿;

相似文献

外文文献
中文文献
专利

1. A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences [J] . YunXue, ZhenglingLiao, MeihangLi, Computational and mathematical methods in medicine . 2015,第1期

机译：基于所有常用子序列的挖掘秩序保存子阶段的新方法
2. Mining Order-Preserving Submatrices from Data with Repeated Measurements [J] . Yip Kevin Y., Kao Ben, Zhu Xinjie, IEEE Transactions on Knowledge and Data Engineering . 2013,第7期

机译：从具有重复测量的数据中挖掘保留订单的子矩阵
3. Mining Bucket Order-Preserving SubMatrices in Gene Expression Data [J] . Fang Qiong, Ng Wilfred, Feng Jianlin, Knowledge and Data Engineering, IEEE Transactions on . 2012,第12期

机译：基因表达数据中挖掘存储桶顺序的子矩阵
4. Mining Order-Preserving Submatrices Based on Frequent Sequential Pattern Mining [C] . Yun Xue, Yuting Li, Weijun Deng, International conference on health information science . 2014

机译：基于频繁序列模式挖掘的保序子矩阵挖掘
5. The Central Limit Theorem for Linear Spectral Statistics of Submatrices of the Gaussian Wigner Random Matrices [D] . Reed, Matthew. 2014

机译：高斯Wigner随机矩阵的子矩阵线性谱统计的中心极限定理。
6. A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences [O] . Yun Xue, Zhengling Liao, Meihang Li, 2015

机译：基于所有常见子序列的保序子矩阵挖掘新方法
7. Mining Order-Preserving Submatrices from Data with Repeated Measurements [O] . Yip KY, Chui CK, Cheung DWL, 2013

机译：从重复测量的数据中挖掘保序子矩阵

Mining Order-Preserving Submatrices from Probabilistic Matrices

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅