Context-Based Distance Learning for Categorical Data Clustering

机译：基于上下文数据群集的远程学习

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of the same categorical attribute, since they are not ordered. In this paper, we propose a method to learn a context-based distance for categorical attributes. The key intuition of this work is that the distance between two values of a categorical attribute A_i can be determined by the way in which the values of the other attributes A_j are distributed in the dataset objects: if they are similarly distributed in the groups of objects in correspondence of the distinct values of A_i a low value of distance is obtained. We propose also a solution to the critical point of the choice of the attributes A_j. We validate our approach on various real world and synthetic datasets, by embedding our distance learning method in both a partitional and a hierarchical clustering algorithm. Experimental results show that our method is competitive w.r.t. categorical data clustering approaches in the state of the art.

机译：分类属性描述的群集数据是数据挖掘应用程序中的具有挑战性的任务。与数值属性不同，很难定义相同分类属性的值对之间的距离，因为它们未被排序。在本文中，我们提出了一种学习基于上下文的距离的方法，用于分类属性。这项工作的关键直觉是，分类属性a_i的两个值之间的距离可以通过其中在数据集对象中分发其他属性a_j的值的方式来确定：如果它们类似地分布在对象组中在对应于A_I的不同值的对应关系中获得了低值的距离。我们还提出了一个解决方法A_J选择临界点的解决方案。我们通过在分区和分层聚类算法中嵌入我们的距离学习方法来验证我们对各种现实世界和合成数据集的方法。实验结果表明，我们的方法是竞争力的W.R.T.现有技术中的分类数据聚类方法。

著录项

来源
《International Conference on Intelligent Data Analysis》|2009年||共12页
会议地点
作者
Dino Ienco; Ruggero G. Pensa; Rosa Meo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词

相似文献

外文文献
中文文献
专利

1. CBDL: Context-Based Distance Learning for Categorical Attributes [J] . Zeinab Khorshidpour, Sattar Hashemi, Ali Hamzeh International Journal of Intelligent Systems . 2011,第11期

机译：CBDL：分类属性的基于上下文的远程学习
2. From Context to Distance: Learning Dissimilarity for Categorical Data Clustering [J] . DINO IENCO, RUGGERO G. PENSA, ROSA MEO ACM transactions on knowledge discovery from data . 2012,第1期

机译：从上下文到距离：分类数据聚类的学习差异
3. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set [J] . Amir Ahmad, Lipika Dey Pattern recognition letters . 2007,第1期

机译：一种在分类数据集无监督学习中计算同一属性的两个分类值之间距离的方法
4. Context-Based Distance Learning for Categorical Data Clustering [C] . Dino Ienco, Ruggero G. Peusa, Rosa Mco Advances in intelligent data analysis VIII . 2009

机译：基于上下文的远程学习用于分类数据聚类
5. Learning Networks with Categorical Data Using Distance Correlation, and a Novel Graph-based Multivariate Test [D] . Tinker, Jian. 2020

机译：使用距离相关性与分类数据学习网络，以及基于新的基于图的多变量测试
6. Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model [O] . Dongyang Yang, Wei Xu 2020

机译：人类微生物统一测序数据的聚类：基于距离的无监督学习模型
7. Distance based Clustering for Categorical Data Extended Abstract [O] . Dino Ienco, Rosa Meo 2013

机译：基于距离的分类数据扩展摘要

Context-Based Distance Learning for Categorical Data Clustering

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅