Joint Cluster Analysis of Attribute and Relationship Data Without A-Priori Specification of the Number of Clusters

机译：属性和关系数据的联合聚类分析，无需A-Priori聚类数的指定

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In many applications, attribute and relationship data are available, carrying complementary information about real world entities. In such cases, a joint analysis of both types of data can yield more accurate results than classical clustering algorithms that either use only attribute data or only relationship (graph) data. The Connected k-Center (CkC) has been proposed as the first joint cluster analysis model to discover k clusters which are cohesive on both attribute and relationship data. However, it is well-known that prior knowledge on the number of clusters is often unavailable in applications such as community identification and hotspot analysis. In this paper, we introduce and formalize the problem of discovering an a-priori unspecified number of clusters in the context of joint cluster analysis of attribute and relationship data, called Connected X Clusters (CXC) problem. True clusters are assumed to be compact and distinctive from their neighboring clusters in terms of attribute data and internally connected in terms of relationship data. Different from classical attribute-based clustering methods, the neighborhood of clusters is not defined in terms of attribute data but in terms of relationship data. To efficiently solve the CXC problem, we present JointClust, an algorithm which adopts a dynamic two-phase approach. In the first phase, we find so called cluster atoms. We provide a probability analysis for this phase, which gives us a probabilistic guarantee, that each true cluster is represented by at least one of the initial cluster atoms. In the second phase, these cluster atoms are merged in a bottom-up manner resulting in a dendrogram. The final clustering is determined by our objective function. Our experimental evaluation on several real datasets demonstrates that JointClust indeed discovers meaningful and accurate clusterings without requiring the user to specify the number of clusters.

机译：在许多应用程序中，都可以使用属性和关系数据，其中包含有关现实世界实体的补充信息。在这种情况下，与仅使用属性数据或仅使用关系（图形）数据的经典聚类算法相比，对两种类型的数据进行联合分析可以获得更准确的结果。已经提出了连通k中心（CkC）作为第一个联合聚类分析模型，以发现在属性和关系数据上具有内聚力的k个聚类。但是，众所周知，在诸如社区标识和热点分析之类的应用程序中，通常无法获得有关群集数量的先验知识。在本文中，我们介绍和形式化了在属性和关系数据的联合聚类分析的背景下发现先验未指定数目的聚类的问题，称为连通X聚类（CXC）问题。假定真实聚类是紧凑的，并且在属性数据方面与相邻聚类不同，在关系数据方面内部聚类。与经典的基于属性的聚类方法不同，聚类的邻域不是根据属性数据定义的，而是根据关系数据定义的。为了有效解决CXC问题，我们提出了JointClust，这是一种采用动态两阶段方法的算法。在第一阶段，我们发现了所谓的簇原子。我们为此阶段提供了概率分析，从而为我们提供了一个概率保证，即每个真实簇均由至少一个初始簇原子表示。在第二阶段，这些簇原子以自下而上的方式合并，形成树状图。最终的聚类取决于我们的目标函数。我们对几个真实数据集的实验评估表明，JointClust确实发现了有意义且准确的聚类，而无需用户指定聚类数量。

著录项

来源
《ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 20070812-15; San Jose,CA(US)》|2007年|P.510-519|共10页
会议地点 San JoseCA(US)
作者
Flavia Moser; Rong Ge; Martin Ester;
展开▼
作者单位

School of Computing Science Simon Fraser University;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类数据处理、数据处理系统;计算技术、计算机技术;
关键词
algorithms; clustering; graph-structured data; joint cluster analysis; community identification; hotspot analysis;

机译：算法;聚类;图结构数据;联合聚类分析;社区识别;热点分析;

相似文献

外文文献
中文文献
专利

1. Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number [J] . Cheung Y.-M., Jia H. Pattern Recognition: The Journal of the Pattern Recognition Society . 2013,第8期

机译：基于统一相似性度量的分类和数字属性数据聚类，而无需知道聚类编号
2. Changes in the model of within-cluster distribution of attributes and their effects on cluster analysis of vegetation data [J] . M. B. Dale Community Ecology . 2007,第1期

机译：群内属性分布模型的变化及其对植被数据聚类分析的影响
3. Delineation of gas hydrate reservoirs in the Ulleung Basin using unsupervised multi-attribute clustering without well log data [J] . Lee Jaewook, Byun Joongmoo, Kim Bona, Journal of natural gas science and engineering . 2017,第期

机译：使用无需井的多属性聚类划分蔚蓝盆地的天然气水合物储存器，没有良好的日志数据
4. Joint Cluster Analysis of Attribute and Relationship Data Without A-Priori Specification of the Number of Clusters [C] . Flavia Moser, Rong Ge, Martin Ester ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 2007

机译：没有a-priori的群集数量的属性和关系数据的联合聚类分析
5. High-Dimensional Data Clustering and Statistical Analysis of Clustering-based Data Summarization Products. [D] . Zhou, Dunke. 2012

机译：高维数据聚类和基于聚类的数据汇总产品的统计分析。
6. A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications [O] . He Peng, Xiangxiang Zeng, Yadi Zhou, 2019

机译：一种用于单细胞RNA测序数据分析和潜在病理生物学意义的成分重叠属性聚类（COAC）算法
7. Robust verification and analysis of the pre-clustering algorithm with a-priori non-specification of the number of clusters [O] . Volodymyr Mosorov, Taras Panskyi, Sebastian Biedron 2015

机译：具有-priorII的预簇算法的鲁棒验证和分析群集数量的非规范
8. Application of Cluster Analysis to Aerometric Data. Volume I. Part 1: Clustering, Validation, and Classification of Data. Part 2: Investigation and Report of Cluster Analysis [R] . Crutcher, H. L. , Nelson, C. , Fairbairn, B. , 1980

机译：聚类分析在航空数据中的应用。第一部分：数据的聚类，验证和分类。第2部分：聚类分析的调查和报告

Joint Cluster Analysis of Attribute and Relationship Data Without A-Priori Specification of the Number of Clusters

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅