...
首页> 外文期刊>Data mining and knowledge discovery >Active exploration for large graphs
【24h】

Active exploration for large graphs

机译:积极探索大型图

获取原文
获取原文并翻译 | 示例

摘要

Modern information networks, such as social networks, communication networks, and citation networks, are often characterized by very large sizes and dynamically changing structures. Common solutions to graph mining tasks (e.g., node classification) usually employ an unrestricted sampling-then-mining paradigm to reduce a large network to a manageable size, followed by subsequent mining tasks. However, real-world networks may be unaccessible at once and must be crawled progressively. This can be due to the fact that the size of the network is too large, or some privacy/legal concerns. In this paper, we propose an Active Exploration framework for large graphs, where the goal is to simultaneously carry out network sampling and node labeling in order to build a sampled network from which the trained classifier can have the maximum node classification accuracy. To achieve this goal, we consider a network as a Markov chain and compute the stationary distribution of the nodes by deriving supervised random walks. The stationary distribution helps identify specific nodes to be sampled in the next step, and the labeling process labels the most informative node which in turn strengthens the sampling of the network. To improve the scalability of active exploration for large graphs, we also propose a more efficient multi-seed algorithm that simultaneously runs multiple, parallel exploration processes, and makes joint decisions to determine which nodes are to be sampled and labeled next. The simultaneous, mutually enhanced sampling and labeling processes ensure that the final sampled network contains a maximum number of nodes directly related to the underlying mining tasks. Experiments on both synthetic and real-world networks demonstrate that our active exploration algorithms have much better chance to include target nodes in the sampled networks than baseline methods.
机译:诸如社交网络,通信网络和引用网络之类的现代信息网络通常以非常大的规模和动态变化的结构为特征。图形挖掘任务的常见解决方案(例如节点分类)通常采用不受限制的采样-然后-挖掘范式来将大型网络缩减为可管理的大小,然后再进行后续挖掘任务。但是,现实世界的网络可能一次无法访问,必须逐步爬网。这可能是由于网络规模太大或某些隐私/法律问题引起的。在本文中,我们提出了一个针对大型图的Active Exploration框架,其目的是同时进行网络采样和节点标记,以构建一个采样网络,训练过的分类器可以从该网络中获得最大的节点分类精度。为了实现此目标,我们将网络视为马尔可夫链,并通过导出有监督的随机游走来计算节点的静态分布。固定分布有助于确定下一步要采样的特定节点,并且标记过程会标记信息最丰富的节点,从而增强网络的采样。为了提高大型图的主动探索的可伸缩性,我们还提出了一种更有效的多种子算法,该算法同时运行多个并行探索过程,并做出联合决策以确定接下来要采样和标记哪些节点。同时相互增强的采样和标记过程可确保最终的采样网络包含与基础挖掘任务直接相关的最大节点数。在合成网络和实际网络上进行的实验表明,与基线方法相比,我们的主动探索算法在采样网络中包含目标节点的机会要大得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号