首页> 外文OA文献 >Complex information networks – detecting community structure in bipartite networks
【2h】

Complex information networks – detecting community structure in bipartite networks

机译:复杂的信息网络 - 检测二分网络中的社区结构

摘要

The last decade has witnessed great expansion in research and study of complex networks. A complex network is a large-scale network that reflects the interactions between objects or components of complicated systems. These components, known as clusters, communities or modules, perform together in order to provide one or more functions of the system. A vast number of systems, from the brain to ecosystems, power grids and the Internet, criminal relationships and financial transactions, can all be described as large complex networks. For most complex networks, the complexity arises from the fact that the structure is highly irregular, complex and dynamically evolving in time; and that the observed patterns of interactions highly influence the behaviour of the entire system. One of the topological properties that can expose the hierarchical structure of complex networks is community structure. Community detection is a common problem in complex networks that consists in general of finding groups of densely connected nodes with few connections to nodes outside of a group. The lack of consensus on a definition for a community leads to extensive studies on community structure of complex networks in order to provide improved community detection methods. Community structure is a common and important topological characteristic of many real world complex networks. In particular, identifying communities in bipartite networks is an important task in many scientific domains. In a bipartite network, the node set consists of two disjoint sets of nodes, primary set (P) and secondary set (S), such that links between nodes may occur only if the nodes belong to different sets. There are really two approaches to identifying clusters in a bipartite network: the first, and more common, is when our real interest is in community structure within the primary node set P; and the second is when our real interest is in bipartite communities within the whole network. Thus, in this research we investigate and study the state-of-the-art of community detection algorithms, in particular, those to identify the communities in bipartite networks in order to provide us with a more complete understanding of the relationship between communities. The practical aim is to derive a coarse-grain description of the network topology that will aid understanding of its hierarchical structure. The research of the thesis consists of four main phases. First, one of the best algorithms for community detection in classical networks, Infomap, has not been adapted to the big and important class of bipartite networks. This research gap is one focus of the thesis. We integrate the weighted projection method for bipartite networks based on common neighbors similarity into Infomap, to acquire a weighted one mode network that can be clustered by this random walks technique. We apply this method to a number of real world bipartite networks, to detect significant community structure. We measure the performance of our approach based on the ground truth. This requires deep knowledge of the formation of relations within and between clusters in these real world networks. Although such investigation is excessively time consuming, and impractical or impossible in large networks, the result is much more accurate and more meaningful and gives us confidence that this method can be usefully applied to large networks where ground truth is not known. Second, several possible edge additions are conducted to test how random walks based algorithm, Infomap, performs when the minimal modification is made to convert a bipartite network to a nearly bipartite (but unipartite) network. The experiments on small bipartite networks obtain encouraging results. Third, we shift focus from community detection based on random walks to community detection based on the strongest communities possible in a bipartite network, which are bicliques. We develop a novel algorithm to identify overlapping communities at the base level of hierarchy in bipartite networks. We combine existing techniques (bicliques, cliques, structural equivalence) into a novel method to solve this new research problem. We classify the output communities into 5 categories based on community strength. From this base level, we apply the Jaccard index as a threshold in order to reduce the redundancy of overlapping communities, to obtain higher levels of the hierarchy. We compare results from our overlapping approach with other concurrent approaches not only directly to the ground truth, but also using a widely accepted scale for evaluating the quality of partitions, Normalized Mutual Information (NMI). In the last phase of the thesis, a large financial bipartite network collected during 6 months fieldwork is analysed and tested in order to reveal its hierarchical structure. We apply all methods presented in Chapter 3, Chapter 4 and Chapter 5. The main contribution of this thesis is an improved method to detect the hierarchical and overlapping community structure in bipartite complex networks based on structural equivalence of nodes. More generally, it aims to derive a coarse-grain depiction of real large-scale networks through structural properties of their identified communities as well as their performance with respect to the known ground truth.
机译:过去十年见证了复杂网络研究的巨大发展。复杂网络是反映复杂系统对象或组件之间相互作用的大规模网络。这些组件(称为集群,社区或模块)一起执行以提供系统的一个或多个功能。从大脑到生态系统,电网和互联网,犯罪关系和金融交易的大量系统都可以描述为大型复杂网络。对于大多数复杂的网络,其复杂性是由于结构高度不规则,复杂且随时间动态变化而产生的。并且观察到的交互模式会极大地影响整个系统的行为。可以揭示复杂网络的层次结构的拓扑属性之一是社区结构。社区检测是复杂网络中的常见问题,通常包括找到密集连接的节点组,而与组外节点的连接很少。对于社区定义缺乏共识导致对复杂网络的社区结构进行了广泛的研究,以提供改进的社区检测方法。社区结构是许多现实世界中复杂网络的共同且重要的拓扑特征。特别是,在两方网络中识别社区是许多科学领域的重要任务。在双向网络中,节点集由两个不相交的节点集组成,即主要集(P)和次要集(S),因此,只有当节点属于不同集时,节点之间的链接才可能发生。在双向网络中,实际上有两种识别集群的方法:第一种,也是更为常见的,当我们真正感兴趣的是主节点集P内的社区结构时;第二个是我们真正的兴趣是整个网络中的两方社区。因此,在这项研究中,我们研究和研究了最新的社区检测算法,特别是那些用于识别两方网络中的社区的算法,以便为我们提供对社区之间关系的更完整的理解。实际目的是获得对网络拓扑的粗粒度描述,这将有助于理解其层次结构。论文的研究分为四个主要阶段。首先,经典网络中用于社区检测的最佳算法之一Infomap尚未适应大型且重要的双向网络。这一研究空白是本文的重点之一。我们将基于共同邻居相似度的二分网络加权投影方法集成到Infomap中,以获取可以通过这种随机游走技术进行聚类的加权单模网络。我们将此方法应用于许多现实世界的双向网络,以检测重要的社区结构。我们根据基本事实评估我们的方法的效果。这需要对这些现实网络中的集群内部以及集群之间的关系形成有深入的了解。尽管这种调查非常耗时,并且在大型网络中不切实际或不可能,但是结果要准确得多,也更有意义,这使我们相信,该方法可以有效地应用于未知地面事实的大型网络。其次,进行了几种可能的边缘加法运算,以测试在进行最小程度的修改以将双向网络转换为几乎双向(但单方)的网络时,基于随机游走的算法Infomap如何执行。在小型二分网络上进行的实验获得了令人鼓舞的结果。第三,我们将重点从基于随机游走的社区检测转移到基于二分网络中可能最强的社区(即双斜体)的社区检测。我们开发了一种新颖的算法,可以在双向网络的层次结构的基础级别上识别重叠的社区。我们将现有技术(bicliques,集团,结构等效性)组合成一种新颖的方法来解决这一新的研究问题。根据社区实力,我们将输出社区分为5类。在此基础级别上,我们将Jaccard索引用作阈值,以减少重叠社区的冗余,以获得更高层次的层次结构。我们将重叠方法与其他并行方法的结果进行比较,不仅直接针对基本事实,而且还使用广泛接受的量表来评估分区的质量,即标准化互信息(NMI)。在论文的最后阶段,对六个月的野外工作中收集到的大型金融双向网络进行了分析和测试,以揭示其层次结构。我们应用第3章介绍的所有方法,第4章和第5章。本文的主要贡献是一种改进的方法,该方法基于节点的结构等效性来检测二分复杂网络中的分层和重叠社区结构。更广泛地讲,它旨在通过其识别出的社区的结构特性及其相对于已知基本事实的性能来得出真实大型网络的粗粒度描述。

著录项

  • 作者

    Alzahrani T;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号