Mining Non-Redundant High Order Correlations in Binary Data

机译：在二进制数据中挖掘非冗余高阶相关

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many approaches have been proposed to find correlations in binary data. Usually, these methods focus on pair-wise correlations. In biology applications, it is important to find correlations that involve more than just two features. Moreover, a set of strongly correlated features should be non-redundant in the sense that the correlation is strong only when all the interacting features are considered together. Removing any feature will greatly reduce the correlation.In this paper, we explore the problem of finding non-redundant high order correlations in binary data. The high order correlations are formalized using multi-information, a generalization of pairwise mutual information. To reduce the redundancy, we require any subset of a strongly correlated feature subset to be weakly correlated. Such feature subsets are referred to as Non-redundant Interacting Feature Subsets (NIFS). Finding all NIFSs is computationally challenging, because in addition to enumerating feature combinations, we also need to check all their subsets for redundancy. We study several properties of NIFSs and show that these properties are useful in developing efficient algorithms. We further develop two sets of upper and lower bounds on the correlations, which can be incorporated in the algorithm to prune the search space. A simple and effective pruning strategy based on pair-wise mutual information is also developed to further prune the search space. The efficiency and effectiveness of our approach are demonstrated through extensive experiments on synthetic and real-life datasets.

机译：已经提出了许多方法来找到二进制数据中的相关性。通常，这些方法专注于成对相关。在生物学应用中，重要的是找到不仅涉及两个特征的相关性。此外，从强烈的意义上说，一组强相关的特征应该是非冗余的，只有当所有相互作用的特征都一起考虑时，相关才很强。删除任何特征将大大减少相关性。本文探讨了在二进制数据中查找非冗余高阶相关性的问题。高阶相关使用多信息形式化，即成对互信息的概括。为了减少冗余，我们要求强相关特征子集的任何子集都是弱相关的。这样的特征子集被称为非冗余交互特征子集（NIFS）。查找所有NIFS都具有计算上的挑战，因为除了枚举特征组合之外，我们还需要检查其所有子集的冗余性。我们研究了NIFS的几个属性，并表明这些属性对于开发有效的算法很有用。我们进一步针对相关性开发了两组上下限，可以将其纳入算法中以缩小搜索空间。还开发了一种基于成对的互信息的简单有效的修剪策略，以进一步修剪搜索空间。通过对合成和现实数据集进行的大量实验证明了我们方法的效率和有效性。

著录项

期刊名称 other
作者
Xiang Zhang; Feng Pan; Wei Wang; Andrew Nobel;
展开▼
作者单位

展开▼
年(卷),期 -1(1),1
年度 -1
页码 1178–1188
总页数 29
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Paper: A Parallel Algorithm for Mining Non-Redundant Recurrent Rules from a Sequence Database [J] . Seung-Yong Yoon, Hirohisa Seki Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2019,第5a139期

机译：纸质：从序列数据库中挖掘非冗余反复规则的并行算法
2. Mining non-redundant recurrent rules from a sequence database [J] . SeungYong Yoon, Hirohisa Seki International journal of computational i . 2018,第3a4期

机译：从序列数据库中挖掘非冗余循环规则
3. Mining Non-Redundant Substitution Rules Between Sets of Items in Large Databases [J] . Chen Yi-Chun, Lee Guanling Journal of information science and engineering . 2015,第2期

机译：在大型数据库中的项目集之间挖掘非冗余替换规则
4. Mining Non-Redundant High Order Correlations in Binary Data [C] . Xiang Zhang, Feng Pan, Wei Wang, International conference on very large data bases;VLDB 2008 . 2008

机译：挖掘二进制数据中的非冗余高阶相关
5. Multimedia data mining and retrieval for multimedia databases using associations and correlations. [D] . Lin, Lin. 2010

机译：使用关联和相关性对多媒体数据库进行多媒体数据挖掘和检索。
6. A Graph-Theoretic Approach for Identifying Non-Redundant and Relevant Gene Markers from Microarray Data Using Multiobjective Binary PSO [O] . Monalisa Mandal, Anirban Mukhopadhyay -1

机译：使用多目标二进制PSO从微阵列数据中识别非冗余和相关基因标记的图论方法
7. Mining Non-Redundant High Order Correlations in Binary Data [O] . Xiang Zhang, Feng Pan, Wei Wang, 2010

机译：挖掘二进制数据中的非冗余高阶相关性
8. Application of Data Mining and Knowledge Discovery Techniques to Enhance Binary Target Detection and Decision-Making for Compromised Visual Images [R] . Repperger, D. W. , Phillips, C. A. , Schrider, C. D. , 2004

机译：数据挖掘与知识发现技术在受损视觉图像二值目标检测与决策中的应用

Mining Non-Redundant High Order Correlations in Binary Data

摘要

著录项

相似文献

相关主题

期刊订阅