...
首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >A Graph-Based Consensus Maximization Approach for Combining Multiple Supervised and Unsupervised Models
【24h】

A Graph-Based Consensus Maximization Approach for Combining Multiple Supervised and Unsupervised Models

机译:结合多个监督模型和非监督模型的基于图的共识最大化方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Ensemble learning has emerged as a powerful method for combining multiple models. Well-known methods, such as bagging, boosting, and model averaging, have been shown to improve accuracy and robustness over single models. However, due to the high costs of manual labeling, it is hard to obtain sufficient and reliable labeled data for effective training. Meanwhile, lots of unlabeled data exist in these sources, and we can readily obtain multiple unsupervised models. Although unsupervised models do not directly generate a class label prediction for each object, they provide useful constraints on the joint predictions for a set of related objects. Therefore, incorporating these unsupervised models into the ensemble of supervised models can lead to better prediction performance. In this paper, we study ensemble learning with outputs from multiple supervised and unsupervised models, a topic where little work has been done. We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the predictions over the graph, but penalizes the deviations from the initial labeling provided by the supervised models. We solve this problem through iterative propagation of probability estimates among neighboring nodes and prove the optimality of the solution. The proposed method can be interpreted as conducting a constrained embedding in a transformed space, or a ranking on the graph. Experimental results on different applications with heterogeneous data sources demonstrate the benefits of the proposed method over existing alternatives. (More information, data, and code are available at http://www.cse.buffalo.edu/~jing/integrate.htm.)
机译:集成学习已成为组合多个模型的强大方法。已经显示出众所周知的方法,例如装袋,提升和模型平均,可以提高单个模型的准确性和鲁棒性。但是,由于人工标记的成本高昂,因此难以获得足够且可靠的标记数据以进行有效的培训。同时,这些来源中存在大量未标记的数据,我们可以轻松获得多个无监督模型。尽管无监督模型不会直接为每个对象生成类别标签预测,但是它们为一组相关对象的联合预测提供了有用的约束。因此,将这些非监督模型合并到监督模型的集合中可以导致更好的预测性能。在本文中,我们使用来自多个有监督和无监督模型的输出来研究集成学习,这是一个工作量很少的话题。我们建议通过最大化监督预测和非监督约束之间的共识来巩固分类解决方案。我们将此集成任务作为优化问题投放到二部图上,其中目标函数比图更倾向于预测的平滑性,但惩罚了与受监督模型提供的初始标记的偏差。我们通过概率估计在相邻节点之间的迭代传播解决了这个问题,并证明了该解决方案的最优性。所提出的方法可以解释为在变换后的空间中进行约束嵌入,或者在图上进行排序。在具有异构数据源的不同应用程序上的实验结果证明了该方法相对于现有替代方法的好处。 (有关更多信息,数据和代码,请访问http://www.cse.buffalo.edu/~jing/integrate.htm。)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号