首页> 外文会议>International Conference on Intelligent Data Analysis >Context-Based Distance Learning for Categorical Data Clustering
【24h】

Context-Based Distance Learning for Categorical Data Clustering

机译:基于上下文数据群集的远程学习

获取原文
获取外文期刊封面目录资料

摘要

Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of the same categorical attribute, since they are not ordered. In this paper, we propose a method to learn a context-based distance for categorical attributes. The key intuition of this work is that the distance between two values of a categorical attribute A_i can be determined by the way in which the values of the other attributes A_j are distributed in the dataset objects: if they are similarly distributed in the groups of objects in correspondence of the distinct values of A_i a low value of distance is obtained. We propose also a solution to the critical point of the choice of the attributes A_j. We validate our approach on various real world and synthetic datasets, by embedding our distance learning method in both a partitional and a hierarchical clustering algorithm. Experimental results show that our method is competitive w.r.t. categorical data clustering approaches in the state of the art.
机译:分类属性描述的群集数据是数据挖掘应用程序中的具有挑战性的任务。与数值属性不同,很难定义相同分类属性的值对之间的距离,因为它们未被排序。在本文中,我们提出了一种学习基于上下文的距离的方法,用于分类属性。这项工作的关键直觉是,分类属性a_i的两个值之间的距离可以通过其中在数据集对象中分发其他属性a_j的值的方式来确定:如果它们类似地分布在对象组中在对应于A_I的不同值的对应关系中获得了低值的距离。我们还提出了一个解决方法A_J选择临界点的解决方案。我们通过在分区和分层聚类算法中嵌入我们的距离学习方法来验证我们对各种现实世界和合成数据集的方法。实验结果表明,我们的方法是竞争力的W.R.T.现有技术中的分类数据聚类方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号