首页> 外文OA文献 >Combined clustering of graph and attribute data
【2h】

Combined clustering of graph and attribute data

机译:图形和属性数据的组合聚类

摘要

In recent years, a rapidly increasing amount of data is collected and stored for various applications. As modern storage systems provide increasing disk space at decreasing costs, databases storing huge amounts of information of different types are ubiquitous. The task of automatically extracting useful and previously unknown knowledge out of such data is called data mining. This thesis focuses on the data mining task of clustering, i.e. grouping objects into clusters such that objects assigned to the same cluster are similar to each other, while objects assigned to different clusters are dissimilar. Two of the most common data types are vector data, where each object is represented as a vector containing different attributes of the object, and graph data, which represents relationships between different objects as edges in a graph. In many applications, data of both types is available simultaneously: for the vertices or the edges of a graph, additional information is available which can be described as an attribute vector. The aim of this thesis is to develop combined clustering approaches that use graph data and attribute data simultaneously in order to detect clusters that are densely connected in the graph and at the same time show similarity in the attribute space. As for high-dimensional vector data, clusters usually exist only in subspaces of the attribute space, we follow the principle of subspace clustering to enable the detection of clusters which show similarity only in a subset of the attributes. In this thesis, we introduce combined clustering approaches for graphs with vertex attributes, graphs with edge attributes and heterogeneous networks with attributed vertices. For all of those data types, our approaches focus on realizing an unbiased combination of graph and attribute data and avoiding redundancy in the clustering result.
机译:近年来,为各种应用收集并存储了数量迅速增加的数据。随着现代存储系统以降低的成本提供增加的磁盘空间,存储大量不同类型信息的数据库无处不在。从此类数据中自动提取有用且先前未知的知识的任务称为数据挖掘。本文主要研究聚类的数据挖掘任务,即将对象分组到聚类中,以使分配给同一聚类的对象彼此相似,而分配给不同聚类的对象互不相同。最常见的两种数据类型是矢量数据(其中每个对象表示为包含对象的不同属性的矢量)和图形数据(其表示不同对象之间的关系作为图形中的边)。在许多应用程序中,两种类型的数据是同时可用的:对于顶点或图形的边缘,可以使用附加信息,这些信息可以描述为属性向量。本文的目的是开发同时使用图形数据和属性数据的组合聚类方法,以检测在图形中密集连接的聚类,同时在属性空间中显示相似性。对于高维向量数据,聚类通常仅存在于属性空间的子空间中,我们遵循子空间聚类的原理,可以检测仅在属性子集中显示相似性的聚类。本文针对具有顶点属性的图,具有边缘属性的图以及具有属性顶点的异构网络,介绍了组合聚类方法。对于所有这些数据类型,我们的方法着重于实现图形和属性数据的无偏组合,并避免聚类结果的冗余。

著录项

  • 作者

    Boden Brigitte;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号