首页> 外文会议>Database systems for advanced applications.;Part 1. >Semi-supervised Clustering of Graph Objects: A Subgraph Mining Approach
【24h】

Semi-supervised Clustering of Graph Objects: A Subgraph Mining Approach

机译:图对象的半监督聚类:子图挖掘方法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Semi-supervised clustering has recently received a lot of attention in the literature, which aims to improve the clustering performance with limited supervision. Most existing semi-supervised clustering studies assume that the data is represented in a vector space, e.g., text and relational data. When the data objects have complex structures, e.g., proteins and chemical compounds, those semi-supervised clustering methods are not directly applicable to clustering such graph objects. In this paper, we study the problem of semi-supervised clustering of data objects which are represented as graphs. The supervision information is in the form of pairwise constraints of must-links and cannot-links. As there is no predefined feature set for the graph objects, we propose to use discriminative subgraph patterns as the features. We design an objective function which incorporates the constraints to guide the subgraph feature mining and selection process. We derive an upper bound of the objective function based on which, a branch-and-bound algorithm is proposed to speedup subgraph mining. We also introduce a redundancy measure into the feature selection process in order to reduce the redundancy in the feature set. When the graph objects are represented in the vector space of the discriminative subgraph features, we use semi-supervised kernel K-means to cluster all graph objects. Experimental results on real-world protein datasets demonstrate that the constraint information can effectively guide the feature selection and clustering process and achieve satisfactory clustering performance.
机译:半监督聚类最近在文献中引起了很多关注,其目的是在有限监督下提高聚类性能。大多数现有的半监督聚类研究都假设数据以向量空间表示,例如文本和关系数据。当数据对象具有复杂的结构(例如蛋白质和化合物)时,这些半监督聚类方法不能直接应用于聚类此类图形对象。在本文中,我们研究了以图形表示的数据对象的半监督聚类问题。监督信息采用必须链接和不能链接的成对约束形式。由于没有为图形对象设置预定义的特征,因此我们建议使用可区分的子图模式作为特征。我们设计了一个目标函数,其中包含了约束以指导子图特征的挖掘和选择过程。我们推导了目标函数的上限,在此基础上,提出了一种分支定界算法来加速子图的挖掘。我们还将冗余度量引入特征选择过程中,以减少特征集中的冗余。当图对象在可区分子图特征的向量空间中表示时,我们使用半监督核K均值对所有图对象进行聚类。在真实蛋白质数据集上的实验结果表明,约束信息可以有效地指导特征选择和聚类过程,并获得令人满意的聚类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号