首页> 外文会议>ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 20070812-15; San Jose,CA(US) >Joint Cluster Analysis of Attribute and Relationship Data Without A-Priori Specification of the Number of Clusters
【24h】

Joint Cluster Analysis of Attribute and Relationship Data Without A-Priori Specification of the Number of Clusters

机译:属性和关系数据的联合聚类分析,无需A-Priori聚类数的指定

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In many applications, attribute and relationship data are available, carrying complementary information about real world entities. In such cases, a joint analysis of both types of data can yield more accurate results than classical clustering algorithms that either use only attribute data or only relationship (graph) data. The Connected k-Center (CkC) has been proposed as the first joint cluster analysis model to discover k clusters which are cohesive on both attribute and relationship data. However, it is well-known that prior knowledge on the number of clusters is often unavailable in applications such as community identification and hotspot analysis. In this paper, we introduce and formalize the problem of discovering an a-priori unspecified number of clusters in the context of joint cluster analysis of attribute and relationship data, called Connected X Clusters (CXC) problem. True clusters are assumed to be compact and distinctive from their neighboring clusters in terms of attribute data and internally connected in terms of relationship data. Different from classical attribute-based clustering methods, the neighborhood of clusters is not defined in terms of attribute data but in terms of relationship data. To efficiently solve the CXC problem, we present JointClust, an algorithm which adopts a dynamic two-phase approach. In the first phase, we find so called cluster atoms. We provide a probability analysis for this phase, which gives us a probabilistic guarantee, that each true cluster is represented by at least one of the initial cluster atoms. In the second phase, these cluster atoms are merged in a bottom-up manner resulting in a dendrogram. The final clustering is determined by our objective function. Our experimental evaluation on several real datasets demonstrates that JointClust indeed discovers meaningful and accurate clusterings without requiring the user to specify the number of clusters.
机译:在许多应用程序中,都可以使用属性和关系数据,其中包含有关现实世界实体的补充信息。在这种情况下,与仅使用属性数据或仅使用关系(图形)数据的经典聚类算法相比,对两种类型的数据进行联合分析可以获得更准确的结果。已经提出了连通k中心(CkC)作为第一个联合聚类分析模型,以发现在属性和关系数据上具有内聚力的k个聚类。但是,众所周知,在诸如社区标识和热点分析之类的应用程序中,通常无法获得有关群集数量的先验知识。在本文中,我们介绍和形式化了在属性和关系数据的联合聚类分析的背景下发现先验未指定数目的聚类的问题,称为连通X聚类(CXC)问题。假定真实聚类是紧凑的,并且在属性数据方面与相邻聚类不同,在关系数据方面内部聚类。与经典的基于属性的聚类方法不同,聚类的邻域不是根据属性数据定义的,而是根据关系数据定义的。为了有效解决CXC问题,我们提出了JointClust,这是一种采用动态两阶段方法的算法。在第一阶段,我们发现了所谓的簇原子。我们为此阶段提供了概率分析,从而为我们提供了一个概率保证,即每个真实簇均由至少一个初始簇原子表示。在第二阶段,这些簇原子以自下而上的方式合​​并,形成树状图。最终的聚类取决于我们的目标函数。我们对几个真实数据集的实验评估表明,JointClust确实发现了有意义且准确的聚类,而无需用户指定聚类数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号