首页> 外文会议>International Conference on Data Mining >A Fully Automated Method for Discovering Community Structures in High Dimensional Data

【24h】

A Fully Automated Method for Discovering Community Structures in High Dimensional Data

机译：一种全自动方法，用于在高维数据中发现社区结构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Identifying modules, or natural communities, in large complex networks is fundamental in many fields, including social sciences, biological sciences and engineering. Recently several methods have been developed to automatically identify communities from complex networks by optimizing the modularity function. The advantage of this type of approaches is that the algorithm does not require any parameter to be tuned. However, the modularity-based methods for community discovery assume that the network structure is given explicitly and is correct. In addition, these methods work best if the network is unweighted and/or sparse. In reality, networks are often not directly defined, or may be given as an affinity matrix. In the first case, each node of the network is defined as a point in a high dimensional space and different networks can be obtained with different network construction methods, resulting in different community structures. In the second case, an affinity matrix may define a dense weighted graph, for which modularity-based methods do not perform well. In this work, we propose a very simple algorithm to automatically identify community structures from these two types of data. Our approach utilizes a k-nearest-neighbor network construction method to capture the topology embedded in high dimensional data, and applies a modularity-based algorithm to identify the optimal community structure. A key to our approach is that the network construction is incorporated with the community identification process and is totally parameter-free. Furthermore, our method can suggest appropriate preprocessing / normalization of the data to improve the results of community identification. We tested our methods on several synthetic and real data sets, and evaluated its performance by internal or external accuracy indices. Compared with several existing approaches, our method is not only fully automatic, but also has the best accuracy overall.

机译：在大型复杂网络中识别模块或自然群群是许多领域的基础，包括社会科学，生物科学和工程。最近已经开发了几种方法来通过优化模块化函数自动识别来自复杂网络的社区。这种类型方法的优点是算法不需要要调整的任何参数。然而，用于社区发现的基于模块化的方法假设网络结构是明确给出的并且是正确的。此外，如果网络是未加权和/或稀疏的，这些方法最佳地工作。实际上，网络通常不直接定义，或者可以作为亲和矩阵给出。在第一种情况下，网络的每个节点被定义为高维空间中的点，并且可以以不同的网络构造方法获得不同的网络，从而产生不同的社区结构。在第二种情况下，亲和矩阵可以定义密集的加权图，其中基于模块化的方法不执行良好。在这项工作中，我们提出了一种非常简单的算法，可以自动识别来自这两种数据的社区结构。我们的方法利用K-最近邻网络施工方法来捕获嵌入在高维数据中的拓扑，并应用基于模块化的算法来识别最佳群落结构。我们方法的关键是，网络建设纳入社区识别过程，完全无参数。此外，我们的方法可以提出数据的适当预处理/归一化，以改善社区识别结果。我们在几种合成和实际数据集上测试了我们的方法，并通过内部或外部精度指标进行评估其性能。与几种现有方法相比，我们的方法不仅是全自动的，而且还具有最优质的精度。

著录项

来源
《International Conference on Data Mining 》|2009年||共6页
会议地点
作者
Jianhua Ruan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP274.2-53;
关键词
Community structure; Modularity; Image clustering;

机译：社区结构;模块化;映像聚类;

相似文献

外文文献
中文文献
专利

1. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [J] . Takitoh S, Fujii S, Mase Y, Bioinformatics . 2007 ,第4期

机译：结合聚类方法对单核苷酸多态性基因分型的二维数据进行准确的自动聚类：通过大规模真实数据进行评估
2. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [J] . Shuichi Takitoh, Shogo Fujii, Yoichi Mase, Bioinformatics . 2007 ,第4期

机译：结合聚类方法对单核苷酸多态性基因分型的二维数据进行准确的自动聚类：通过大规模真实数据进行评估
3. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [J] . Shuichi Takitoh14 Shogo Fujii14 Yoichi Mase14 Junichi Takasaki1 Toshimasa Yamazaki1 Yozo Ohnishi25 Masao Yanagisawa4 Yusuke Nakamura35 and Naoyuki Kamatani16 Bioinformatics . 2007 ,第4期

机译：结合聚类方法对单核苷酸多态性基因分型的二维数据进行准确的自动聚类：通过大规模真实数据进行评估
4. A Fully Automated Method for Discovering Community Structures in High Dimensional Data [C] . Ruan Jianhua Data Mining, 2009. ICDM '09 . 2009

机译：在高维数据中发现社区结构的全自动方法
5. Advanced Methods for Discovering Genetic Markers Associated with High Dimensional Imaging Data [D] . Zhang, Jingwen 2018

机译：用于发现与高维成像数据相关的遗传标记的高级方法
6. A Fully Automated Method for Discovering Community Structures in High Dimensional Data [O] . Jianhua Ruan -1

机译：在高维数据中发现社区结构的全自动方法
7. A Fully Automated Method for Discovering Community Structures in High Dimensional Data [O] . Jianhua Ruan 2010

机译：在高维数据中发现社区结构的全自动方法

A Fully Automated Method for Discovering Community Structures in High Dimensional Data

摘要

著录项

相似文献

相关主题

期刊订阅