首页> 外文会议>ACMKDD International Conference on Knowledge Discovery and Data Mining;KDD 2008 >Using Ghost Edges for Classification in Sparsely Labeled Networks
【24h】

Using Ghost Edges for Classification in Sparsely Labeled Networks

机译:在稀有标签的网络中使用Ghost Edges进行分类

获取原文

摘要

We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse.In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.
机译:我们解决了部分标记的网络中的分类问题(也称为网络内分类),在这种情况下,观察到的类别标签比较稀疏。通过利用相邻节点的类标签之间的依赖关系,统计关系学习技术已在网络分类任务中表现良好。但是,当未标记节点的标记邻居太少而无法支持学习(在训练阶段)和/或推理(在测试阶段)时,关系分类器可能会失败。当观察到的标签稀疏时,在实际问题中会出现这种情况。 在本文中,我们提出了一种新颖的网络内部分类方法,该方法结合了统计关系学习和半监督学习的方面,以改善稀疏网络中的分类性能。我们的方法是通过向网络添加“幽灵边缘”来实现的,这可以使信息从标记的节点流向未标记的节点。通过对现实世界数据集的实验,我们证明了我们的方法在一系列条件下表现良好,而现有条件(例如集体分类和半监督学习)失败了。在所有任务上,我们的方法比现有方法最多可将ROC曲线(AUC)下的面积提高15个点。此外,我们证明了我们的方法在时间上与L·E成正比,其中L是标记节点的数量,E是边的数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号