首页> 外文期刊>Big Data Research >Multi-Label Regularized Generative Model for Semi-Supervised Collective Classification in Large-Scale Networks
【24h】

Multi-Label Regularized Generative Model for Semi-Supervised Collective Classification in Large-Scale Networks

机译:大型网络中半监督集体分类的多标签正则化生成模型

获取原文
获取原文并翻译 | 示例
           

摘要

The problem of collective classification(CC) for large-scale network data has received considerable attention in the last decade. Enabling CC usually increases accuracy when given a fully-labeled network with a large amount of labeled data. However, such labels can be difficult to obtain and learning a CC model with only a few such labels in large-scale sparsely labeled networks can lead to poor performance. In this paper, we show that leveraging the unlabeled portion of the data through semi-supervised collective classification(SSCC) is essential to achieving high performance. First, we describe a novel data-generating algorithm, called generative model with network regularization(GMNR), to exploit both labeled and unlabeled data in large-scale sparsely labeled networks. In GMNR, a network regularizer is constructed to encode the network structure information, and we apply the network regularizer to smooth the probability density functions of the generative model. Second, we extend our proposed GMNR algorithm to handle network data consisting of multi-label instances. This approach, called the multi-label regularized generative model(MRGM), includes an additional label regularizer to encode the label correlation, and we show how these smoothing regularizers can be incorporated into the objective function of the model to improve the performance of CC in multi-label setting. We then develop an optimization scheme to solve the objective function based on EM algorithm. Empirical results on several real-world network data classification tasks show that our proposed methods are better than the compared collective classification algorithms especially when labeled data is scarce.
机译:在过去的十年中,大规模网络数据的集体分类(CC)问题受到了相当大的关注。当给定具有大量标记数据的完全标记网络时,启用CC通常会提高准确性。但是,这样的标签可能很难获得,并且在大规模的稀疏标签网络中仅使用几个这样的标签来学习CC模型可能会导致性能下降。在本文中,我们表明,通过半监督集体分类(SSCC)来利用数据的未标记部分对于实现高性能至关重要。首先,我们描述了一种新颖的数据生成算法,称为带有网络正则化的生成模型(GMNR),以利用大规模稀疏标记网络中的标记和未标记数据。在GMNR中,构造了网络正则化器来对网络结构信息进行编码,并且我们应用网络正则化器来平滑生成模型的概率密度函数。其次,我们将提出的GMNR算法扩展为处理由多标签实例组成的网络数据。这种称为多标签正则化生成模型(MRGM)的方法包括一个附加的标签正则化器来编码标签相关性,并且我们展示了如何将这些平滑化正则化器并入模型的目标函数中以改善CC的性能。多标签设置。然后,我们基于EM算法提出了一种优化方案来求解目标函数。对一些实际网络数据分类任务的经验结果表明,我们提出的方法比比较的集体分类算法更好,尤其是在标签数据很少的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号