...
首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Coupled Attribute Similarity Learning on Categorical Data
【24h】

Coupled Attribute Similarity Learning on Categorical Data

机译:分类数据的耦合属性相似性学习

获取原文
获取原文并翻译 | 示例

摘要

Attribute independence has been taken as a major assumption in the limited research that has been conducted on similarity analysis for categorical data, especially unsupervised learning. However, in real-world data sources, attributes are more or less associated with each other in terms of certain coupling relationships. Accordingly, recent works on attribute dependency aggregation have introduced the co-occurrence of attribute values to explore attribute coupling, but they only present a local picture in analyzing categorical data similarity. This is inadequate for deep analysis, and the computational complexity grows exponentially when the data scale increases. This paper proposes an efficient data-driven similarity learning approach that generates a coupled attribute similarity measure for nominal objects with attribute couplings to capture a global picture of attribute similarity. It involves the frequency-based intra-coupled similarity within an attribute and the inter-coupled similarity upon value co-occurrences between attributes, as well as their integration on the object level. In particular, four measures are designed for the inter-coupled similarity to calculate the similarity between two categorical values by considering their relationships with other attributes in terms of power set, universal set, joint set, and intersection set. The theoretical analysis reveals the equivalent accuracy and superior efficiency of the measure based on the intersection set, particularly for large-scale data sets. Intensive experiments of data structure and clustering algorithms incorporating the coupled dissimilarity metric achieve a significant performance improvement on state-of-the-art measures and algorithms on 13 UCI data sets, which is confirmed by the statistical analysis. The experiment results show that the proposed coupled attribute similarity is generic, and can effectively and efficiently capture the intrinsic and global interactions within and between attrib- tes for especially large-scale categorical data sets. In addition, two new coupled categorical clustering algorithms, i.e., CROCK and CLIMBO are proposed, and they both outperform the original ones in terms of clustering quality on UCI data sets and bibliographic data.
机译:在对分类数据,尤其是无监督学习的相似性分析进行的有限研究中,属性独立已被视为主要假设。但是,在现实世界的数据源中,就某些耦合关系而言,属性或多或少相互关联。因此,有关属性依赖聚合的最新工作引入了属性值的共现来探索属性耦合,但是它们仅在分析分类数据相似性时呈现局部图像。这不足以进行深入分析,并且随着数据规模的增加,计算复杂度呈指数增长。本文提出了一种有效的数据驱动的相似性学习方法,该方法为具有属性耦合的标称对象生成耦合的属性相似性度量,以捕获属性相似性的全局图片。它涉及属性内基于频率的内部耦合相似性,以及属性之间的值共现的相互耦合相似性,以及它们在对象级别上的集成。尤其是,针对内部耦合相似性设计了四个度量,以通过考虑它们在幂集,通用集,联合集和交集方面与其他属性的关系来计算两个分类值之间的相似度。理论分析揭示了基于交集的测量的等效精度和优越的效率,尤其是对于大规模数据集。通过对数据结构和聚类算法的深入实验,结合耦合的相异性度量,在13个UCI数据集上的最新测量和算法上,性能得到了显着提高,这已得到统计分析的证实。实验结果表明,所提出的耦合属性相似性是通用的,可以有效地捕获特别是大规模分类数据集属性内和属性之间的内在和全局相互作用。此外,提出了两种新的耦合分类聚类算法,即CROCK和CLIMBO,它们在UCI数据集和书目数据上的聚类质量均优于原始算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号