首页> 美国卫生研究院文献>other >A Fully Automated Method for Discovering Community Structures in High Dimensional Data
【2h】

A Fully Automated Method for Discovering Community Structures in High Dimensional Data

机译:在高维数据中发现社区结构的全自动方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Identifying modules, or natural communities, in large complex networks is fundamental in many fields, including social sciences, biological sciences and engineering. Recently several methods have been developed to automatically identify communities from complex networks by optimizing the modularity function. The advantage of this type of approaches is that the algorithm does not require any parameter to be tuned. However, the modularity-based methods for community discovery assume that the network structure is given explicitly and is correct. In addition, these methods work best if the network is unweighted and/or sparse. In reality, networks are often not directly defined, or may be given as an affinity matrix. In the first case, each node of the network is defined as a point in a high dimensional space and different networks can be obtained with different network construction methods, resulting in different community structures. In the second case, an affinity matrix may define a dense weighted graph, for which modularity-based methods do not perform well. In this work, we propose a very simple algorithm to automatically identify community structures from these two types of data. Our approach utilizes a k-nearest-neighbor network construction method to capture the topology embedded in high dimensional data, and applies a modularity-based algorithm to identify the optimal community structure. A key to our approach is that the network construction is incorporated with the community identification process and is totally parameter-free. Furthermore, our method can suggest appropriate preprocessingormalization of the data to improve the results of community identification. We tested our methods on several synthetic and real data sets, and evaluated its performance by internal or external accuracy indices. Compared with several existing approaches, our method is not only fully automatic, but also has the best accuracy overall.
机译:在大型复杂网络中,识别模块或自然社区是许多领域的基础,包括社会科学,生物科学和工程。最近,已经开发了几种方法,可以通过优化模块化功能自动从复杂网络中识别社区。这种方法的优点是该算法不需要调整任何参数。但是,基于模块的社区发现方法假定网络结构已明确给出并且是正确的。另外,如果网络未加权和/或稀疏,则这些方法最有效。实际上,网络通常不是直接定义的,或者可以作为亲和度矩阵给出。在第一种情况下,网络的每个节点都被定义为高维空间中的一个点,并且可以通过不同的网络构建方法获得不同的网络,从而导致不同的社区结构。在第二种情况下,亲和度矩阵可能会定义一个密集的加权图,对于该图,基于模块化的方法不能很好地执行。在这项工作中,我们提出了一种非常简单的算法,可以从这两种类型的数据中自动识别社区结构。我们的方法利用k近邻网络构建方法来捕获嵌入在高维数据中的拓扑,并应用基于模块化的算法来识别最佳社区结构。我们方法的关键是网络建设与社区识别过程结合在一起,并且完全没有参数。此外,我们的方法可以建议对数据进行适当的预处理/规范化,以改善社区识别的结果。我们在多个综合和真实数据集上测试了我们的方法,并通过内部或外部准确性指标评估了其性能。与几种现有方法相比,我们的方法不仅是全自动的,而且总体上具有最佳的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号