Coupled Attribute Similarity Learning on Categorical Data

Wang Can; Dong Xiangjun; Zhou Fei; Cao Longbing; Chi Chi-Hung

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Coupled Attribute Similarity Learning on Categorical Data

【24h】

Coupled Attribute Similarity Learning on Categorical Data

机译：分类数据的耦合属性相似性学习

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Attribute independence has been taken as a major assumption in the limited research that has been conducted on similarity analysis for categorical data, especially unsupervised learning. However, in real-world data sources, attributes are more or less associated with each other in terms of certain coupling relationships. Accordingly, recent works on attribute dependency aggregation have introduced the co-occurrence of attribute values to explore attribute coupling, but they only present a local picture in analyzing categorical data similarity. This is inadequate for deep analysis, and the computational complexity grows exponentially when the data scale increases. This paper proposes an efficient data-driven similarity learning approach that generates a coupled attribute similarity measure for nominal objects with attribute couplings to capture a global picture of attribute similarity. It involves the frequency-based intra-coupled similarity within an attribute and the inter-coupled similarity upon value co-occurrences between attributes, as well as their integration on the object level. In particular, four measures are designed for the inter-coupled similarity to calculate the similarity between two categorical values by considering their relationships with other attributes in terms of power set, universal set, joint set, and intersection set. The theoretical analysis reveals the equivalent accuracy and superior efficiency of the measure based on the intersection set, particularly for large-scale data sets. Intensive experiments of data structure and clustering algorithms incorporating the coupled dissimilarity metric achieve a significant performance improvement on state-of-the-art measures and algorithms on 13 UCI data sets, which is confirmed by the statistical analysis. The experiment results show that the proposed coupled attribute similarity is generic, and can effectively and efficiently capture the intrinsic and global interactions within and between attrib- tes for especially large-scale categorical data sets. In addition, two new coupled categorical clustering algorithms, i.e., CROCK and CLIMBO are proposed, and they both outperform the original ones in terms of clustering quality on UCI data sets and bibliographic data.

机译：在对分类数据，尤其是无监督学习的相似性分析进行的有限研究中，属性独立已被视为主要假设。但是，在现实世界的数据源中，就某些耦合关系而言，属性或多或少相互关联。因此，有关属性依赖聚合的最新工作引入了属性值的共现来探索属性耦合，但是它们仅在分析分类数据相似性时呈现局部图像。这不足以进行深入分析，并且随着数据规模的增加，计算复杂度呈指数增长。本文提出了一种有效的数据驱动的相似性学习方法，该方法为具有属性耦合的标称对象生成耦合的属性相似性度量，以捕获属性相似性的全局图片。它涉及属性内基于频率的内部耦合相似性，以及属性之间的值共现的相互耦合相似性，以及它们在对象级别上的集成。尤其是，针对内部耦合相似性设计了四个度量，以通过考虑它们在幂集，通用集，联合集和交集方面与其他属性的关系来计算两个分类值之间的相似度。理论分析揭示了基于交集的测量的等效精度和优越的效率，尤其是对于大规模数据集。通过对数据结构和聚类算法的深入实验，结合耦合的相异性度量，在13个UCI数据集上的最新测量和算法上，性能得到了显着提高，这已得到统计分析的证实。实验结果表明，所提出的耦合属性相似性是通用的，可以有效地捕获特别是大规模分类数据集属性内和属性之间的内在和全局相互作用。此外，提出了两种新的耦合分类聚类算法，即CROCK和CLIMBO，它们在UCI数据集和书目数据上的聚类质量均优于原始算法。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on 》 |2015年第4期| 781-797| 共17页
作者
Wang Can; Dong Xiangjun; Zhou Fei; Cao Longbing; Chi Chi-Hung;
展开▼
作者单位

, Commonwealth Scientific and Industrial Research Organisation, Sandy Bay, TAS, Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Algorithm design and analysis; Clustering algorithms; Couplings; Frequency measurement; Motion pictures; Unsupervised learning; Clustering; coupled attribute similarity; coupled object analysis; similarity analysis; unsupervised learning; unsupervised learning.;

机译：算法设计与分析;聚类算法;耦合;频率测量;运动图像;无监督学习;聚类;耦合属性相似度;耦合对象分析;相似度分析;无监督学习;无监督学习;

相似文献

外文文献
中文文献
专利

1. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set [J] . Amir Ahmad, Lipika Dey Pattern recognition letters . 2007 ,第1期

机译：一种在分类数据集无监督学习中计算同一属性的两个分类值之间距离的方法
2. Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number [J] . Cheung Y.-M., Jia H. Pattern Recognition: The Journal of the Pattern Recognition Society . 2013 ,第8期

机译：基于统一相似性度量的分类和数字属性数据聚类，而无需知道聚类编号
3. Unsupervised Coupled Metric Similarity for Non-IID Categorical Data [J] . Songlei Jian, Longbing Cao, Kai Lu, Knowledge and Data Engineering, IEEE Transactions on . 2018 ,第9期

机译：非IID分类数据的无监督耦合度量相似度
4. Extensible Attribute Similarity Data Mining for Categorical Data Streams in Web Usage Framework [C] . N. Pushpalatha, S. Sai Satyanarayana Reddy, N. Subhash Chandra International Conference on ICT on Sustainable Development . 2020

机译：Web使用框架中的分类数据流的可扩展属性相似性数据挖掘
5. Similarities and differences between heterosexual and homosexual couples based on MARQ data [D] . Shattuck, Kraig S. 2015

机译：基于MARQ数据的异性恋和同性恋夫妻之间的异同
6. Coupled Node Similarity Learning for Community Detection in Attributed Networks [O] . Fanrong Meng, Xiaobin Rui, Zhixiao Wang, 2018

机译：耦合节点相似性学习归属网络中的社区检测
7. Heuristic Algorithm for Interpretation of Non-Atomic Categorical Attributes in Similarity-based Fuzzy Databases – Scalability Evaluation [O] . M. Shahriar, Hossain Rafal, A. Angryk 2016

机译：基于相似模糊数据库的非原子分类属性解释的启发式算法 - 可扩展性评估
8. Similarity Measures on Binary Attribute Data [R] . Janowitz, M. F. 1979

机译：二元属性数据的相似度量

Coupled Attribute Similarity Learning on Categorical Data

摘要

著录项

相似文献

相关主题

期刊订阅