首页> 外文会议>SIAM International Conference on Data Mining >A Generative Model with Network Regularization for Semi-Supervised Collective Classification
【24h】

A Generative Model with Network Regularization for Semi-Supervised Collective Classification

机译:具有半监控集体分类的网络正规的生成模型

获取原文

摘要

In recent years much effort has been devoted to Collective Classification (CC) techniques for predicting labels of linked instances. Given a large number of labeled data, conventional CC algorithms make use of local labeled neighbours to increase accuracy. However, in many real-world applications, labeled data are limited and very expensive to obtain. In this situation, most of the data have no connection to labeled data, and supervision knowledge cannot be obtained from the local connections. Recently, Semi-Supervised Collective Classification (SSCC) has been examined to leverage unlabeled data for enhancing the classification performance of CC. In this paper we propose a probabilistic generative model with network regularization (GMNR) for SSCC. Our main idea is to compute label probability distributions for unlabeled instances by maximizing both the log-likelihood in the generative model and the label smoothness on the network topology of data. The proposed generative model is based on the Probabilistic Latent Semantic Analysis (PLSA) method using attribute features of all instances. A network regularizer is employed to smooth the label probability distributions on the network topology of data. Finally, we develop an effective EM algorithm to compute the label probability distributions for label prediction. Experimental results on three real sparsely-labeled network datasets show that the proposed model GMNR outperforms state-of-the-art CC algorithms and other SSCC algorithms.
机译:近年来,努力致力于用于预测链接实例标签的集体分类(CC)技术。鉴于大量标记数据,传统的CC算法利用本地标记的邻居来提高精度。然而,在许多现实世界应用中,标记的数据有限且非常昂贵。在这种情况下,大多数数据都没有与标记数据的连接,并且无法从本地连接获得监督知识。最近,已经研究了半监督集体分类(SSCC)以利用未标记的数据来提高CC的分类性能。在本文中,我们提出了一种具有网络正规化(GMNR)的概率性生成模型,用于SSCC。我们的主要思想是通过最大化生成模型中的日志似然和数据的网络拓扑上的标签平滑度来计算未标记的实例的标签概率分布。所提出的生成模型基于使用所有实例的属性特征的概率潜在语义分析(PLSA)方法。采用网络规范器来平滑标签概率分布对数据的网络拓扑。最后,我们开发了一种有效的EM算法来计算标签预测的标签概率分布。三个真正稀疏标记的网络数据集的实验结果表明,建议的模型GMNR优于最先进的CC算法和其他SSCC算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号