...
首页> 外文期刊>IEEE Transactions on Information Theory >The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter
【24h】

The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter

机译:具有未知混合参数的模式识别中标记和未标记样本的相对值

获取原文
获取原文并翻译 | 示例
           

摘要

We observe a training set Q composed of l labeled samples {(X/sub 1/,/spl theta//sub 1/),...,(X/sub l/, /spl theta//sub l/)} and u unlabeled samples {X/sub 1/',...,X/sub u/'}. The labels /spl theta//sub i/ are independent random variables satisfying Pr{/spl theta//sub i/=1}=/spl eta/, Pr{/spl theta//sub i/=2}=1-/spl eta/. The labeled observations X/sub i/ are independently distributed with conditional density f/sub /spl theta/i/(/spl middot/) given /spl theta//sub i/. Let (X/sub 0/,/spl theta//sub 0/) be a new sample, independently distributed as the samples in the training set. We observe X/sub 0/ and we wish to infer the classification /spl theta//sub 0/. In this paper we first assume that the distributions f/sub 1/(/spl middot/) and f/sub 2/(/spl middot/) are given and that the mixing parameter is unknown. We show that the relative value of labeled and unlabeled samples in reducing the risk of optimal classifiers is the ratio of the Fisher informations they carry about the parameter /spl eta/. We then assume that two densities g/sub 1/(/spl middot/) and g/sub 2/(/spl middot/) are given, but we do not know whether g/sub 1/(/spl middot/)=f/sub 1/(/spl middot/) and g/sub 2/(/spl middot/)=f/sub 2/(/spl middot/) or if the opposite holds, nor do we know /spl eta/. Thus the learning problem consists of both estimating the optimum partition of the observation space and assigning the classifications to the decision regions. Here, we show that labeled samples are necessary to construct a classification rule and that they are exponentially more valuable than unlabeled samples.
机译:我们观察到由l个标记样本组成的训练集Q {(X / sub 1 /,/ spl theta // sub 1 /),...,(X / sub l /,/ spl theta // sub l /)}和u个未标记的样本{X / sub 1 /',...,X / sub u /'}。标签/ spl theta // sub i /是满足Pr {/ spl theta // sub i / = 1} = / spl eta /,Pr {/ spl theta // sub i / = 2} = 1-的独立随机变量/ spl eta /。给定的/ spl theta // sub i /,标记的观测值X / sub i /独立分布,条件密度为f / sub / spl theta / i /(// spl middot /)。令(X / sub 0 /,/ spl theta // sub 0 /)是一个新样本,在训练集中作为样本独立分发。我们观察到X / sub 0 /,并希望推断分类/ spl theta // sub 0 /。在本文中,我们首先假定给出了f / sub 1 /(/// spl middot /)和f / sub 2 /(// spl middot /)的分布,并且混合参数未知。我们显示,在降低最佳分类器风险方面,标记和未标记样本的相对价值是它们携带的参数/ spl eta /的Fisher信息的比率。然后我们假设给出了两个密度g / sub 1 /(/ spl middot /)和g / sub 2 /(/ spl middot /),但是我们不知道g / sub 1 /(/ spl middot /)= f / sub 1 /(/ spl middot /)和g / sub 2 /(/ spl middot /)= f / sub 2 /(/ spl middot /)或如果相反成立,我们也不知道/ spl eta /。因此,学习问题既包括估计观察空间的最佳划分,又包括将分类分配给决策区域。在这里,我们表明标记的样本对于构建分类规则是必不可少的,并且它们比未标记的样本具有成倍的价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号