首页> 外文会议>International Conference on Data Mining >A Fully Automated Method for Discovering Community Structures in High Dimensional Data
【24h】

A Fully Automated Method for Discovering Community Structures in High Dimensional Data

机译:一种全自动方法,用于在高维数据中发现社区结构

获取原文

摘要

Identifying modules, or natural communities, in large complex networks is fundamental in many fields, including social sciences, biological sciences and engineering. Recently several methods have been developed to automatically identify communities from complex networks by optimizing the modularity function. The advantage of this type of approaches is that the algorithm does not require any parameter to be tuned. However, the modularity-based methods for community discovery assume that the network structure is given explicitly and is correct. In addition, these methods work best if the network is unweighted and/or sparse. In reality, networks are often not directly defined, or may be given as an affinity matrix. In the first case, each node of the network is defined as a point in a high dimensional space and different networks can be obtained with different network construction methods, resulting in different community structures. In the second case, an affinity matrix may define a dense weighted graph, for which modularity-based methods do not perform well. In this work, we propose a very simple algorithm to automatically identify community structures from these two types of data. Our approach utilizes a k-nearest-neighbor network construction method to capture the topology embedded in high dimensional data, and applies a modularity-based algorithm to identify the optimal community structure. A key to our approach is that the network construction is incorporated with the community identification process and is totally parameter-free. Furthermore, our method can suggest appropriate preprocessing / normalization of the data to improve the results of community identification. We tested our methods on several synthetic and real data sets, and evaluated its performance by internal or external accuracy indices. Compared with several existing approaches, our method is not only fully automatic, but also has the best accuracy overall.
机译:在大型复杂网络中识别模块或自然群群是许多领域的基础,包括社会科学,生物科学和工程。最近已经开发了几种方法来通过优化模块化函数自动识别来自复杂网络的社区。这种类型方法的优点是算法不需要要调整的任何参数。然而,用于社区发现的基于模块化的方法假设网络结构是明确给出的并且是正确的。此外,如果网络是未加权和/或稀疏的,这些方法最佳地工作。实际上,网络通常不直接定义,或者可以作为亲和矩阵给出。在第一种情况下,网络的每个节点被定义为高维空间中的点,并且可以以不同的网络构造方法获得不同的网络,从而产生不同的社区结构。在第二种情况下,亲和矩阵可以定义密集的加权图,其中基于模块化的方法不执行良好。在这项工作中,我们提出了一种非常简单的算法,可以自动识别来自这两种数据的社区结构。我们的方法利用K-最近邻网络施工方法来捕获嵌入在高维数据中的拓扑,并应用基于模块化的算法来识别最佳群落结构。我们方法的关键是,网络建设纳入社区识别过程,完全无参数。此外,我们的方法可以提出数据的适当预处理/归一化,以改善社区识别结果。我们在几种合成和实际数据集上测试了我们的方法,并通过内部或外部精度指标进行评估其性能。与几种现有方法相比,我们的方法不仅是全自动的,而且还具有最优质的精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号