首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Scalable Graph-Based Semi-Supervised Learning through Sparse Bayesian Model
【24h】

Scalable Graph-Based Semi-Supervised Learning through Sparse Bayesian Model

机译:稀疏贝叶斯模型的可扩展基于图的半监督学习

获取原文
获取原文并翻译 | 示例

摘要

Semi-supervised learning (SSL) concerns the problem of how to improve classifiers’ performance through making use of prior knowledge from unlabeled data. Many SSL methods have been developed to integrate unlabeled data into the classifiers based on either the manifold or cluster assumption in recent years. In particular, the graph-based approaches, following the manifold assumption, have achieved a promising performance in many real-world applications. However, most of them work well on small-scale data sets only and lack probabilistic outputs. In this paper, a scalable graph-based SSL framework through sparse Bayesian model is proposed by defining a graph-based sparse prior. Based on the traditional Bayesian inference technique, a sparse Bayesian SSL algorithm (SBS $^2$ L) is obtained, which can remove the irrelevant unlabeled samples and make probabilistic prediction for out-of-sample data. Moreover, in order to scale SBS $^2$ L to large-scale data sets, an incremental SBS $^2$ L (ISBS$^2$ L) is derived. The key idea of ISBS $^2$ L is employing an incremental strategy and sequentially selecting parts of unlabeled samples that contribute to the learning instead of using all available unlabeled samples directly. ISBS$^2$ L has lower time and space complexities than previous SSL algorithms with the use of all unlabeled samples. Extensive experiments on various data sets verify that our algorithms can achieve comparable classification effectiveness and efficiency with much better scalability. Finally, the generalization error bound is derived based on robustness analysis.
机译:半监督学习(SSL)涉及如何通过利用未标记数据中的先验知识来提高分类器性能的问题。近年来,已经开发了许多SSL方法,以基于流形或聚类假设将未标记的数据集成到分类器中。尤其是,遵循流形假设的基于图的方法在许多实际应用中都实现了令人鼓舞的性能。但是,它们中的大多数仅在小型数据集上运行良好,并且缺乏概率输出。通过定义基于图的稀疏先验,提出了一种基于稀疏贝叶斯模型的可扩展的基于图的SSL框架。一种基于传统贝叶斯推理技术的稀疏贝叶斯SSL算法(SBS $ ^ 2 $ L),它可以删除不相关的未标记样本并为样本外数据进行概率预测。此外,为了缩放SBS $ ^ 2 $ L到大型数据集,增量式SBS $ ^ 2 $ L(ISBS 得出$ ^ 2 $ L)。 ISBS $ ^ 2 $ L正在采用一种增量策略,并依次选择一些有助于学习的未标记样本,而不是直接使用所有可用的未标记样本。 ISBS $ ^ 2 $ 与所有使用未标记样本的SSL算法相比,alternatives> L具有较低的时间和空间复杂度。在各种数据集上进行的大量实验证明,我们的算法可以实现可比的分类有效性和效率,并具有更好的可扩展性。最后,基于鲁棒性分析得出广义误差界。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号