首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Semi-supervised classification and betweenness computation on large, sparse, directed graphs
【24h】

Semi-supervised classification and betweenness computation on large, sparse, directed graphs

机译:大型,稀疏,有向图的半监督分类和中介计算

获取原文
获取原文并翻译 | 示例
           

摘要

This work addresses graph-based semi-supervised classification and betweenness computation in large, sparse, networks (several millions of nodes). The objective of semi-supervised classification is to assign a label to unlabeled nodes using the whole topology of the graph and the labeling at our disposal. Two approaches are developed to avoid explicit computation of pairwise proximity between the nodes of the graph, which would be impractical for graphs containing millions of nodes. The first approach directly computes, for each class, the sum of the similarities between the nodes to classify and the labeled nodes of the class, as suggested initially in [1,2]. Along this approach, two algorithms exploiting different state-of-the-art kernels on a graph are developed. The same strategy can also be used in order to compute a betweenness measure. The second approach works on a trellis structure built from biased random walks on the graph, extending an idea introduced in [3]. These random walks allow to define a biased bounded betweenness for the nodes of interest, defined separately for each class. All the proposed algorithms have a linear computing time in the number of edges while providing good results, and hence are applicable to large sparse networks. They are empirically validated on medium-size standard data sets and are shown to be competitive with state-of-the-art techniques. Finally, we processed a novel data set, which is made available for benchmarking, for multi-class classification in a large network: the U.S. patents citation network containing 3M nodes (of six different classes) and 38M edges. The three proposed algorithms achieve competitive results (around 85% classification rate) on this large networkthey classify the unlabeled nodes within a few minutes on a standard workstation.
机译:这项工作解决了大型稀疏网络(数百万个节点)中基于图的半监督分类和中介计算。半监督分类的目的是使用图的整个拓扑结构和我们可以使用的标签为未标签的节点分配标签。开发了两种方法来避免显式计算图的节点之间的成对邻近度,这对于包含数百万个节点的图是不切实际的。第一种方法直接为每个类计算要分类的节点与该类的标记节点之间的相似度之和,如[1,2]最初建议的那样。沿着这种方法,开发了两种利用图上不同的最新内核的算法。为了计算中间性度量,也可以使用相同的策略。第二种方法适用于由图上的有偏随机游动构建的网格结构,扩展了[3]中引入的思想。这些随机游走允许为每个类别分别定义的目标节点定义一个有界的有界中间值。所有提出的算法在边缘数量上都具有线性计算时间,同时提供了良好的结果,因此适用于大型稀疏网络。它们在中型标准数据集上进行了经验验证,并显示出与最新技术的竞争力。最后,我们处理了一个新颖的数据集,该数据集可用于基准测试,用于大型网络中的多类别分类:包含3M个节点(六个不同类别)和38M个边的美国专利引用网络。提出的三种算法在这个大型网络上获得了竞争性的结果(大约85%的分类率),它们可以在几分钟内在标准工作站上对未标记的节点进行分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号