【24h】

Learning Probabilistic Linear-Threshold Classifiers via Selective Sampling

机译:通过选择性采样学习概率线性阈值分类器

获取原文
获取原文并翻译 | 示例

摘要

In this paper we investigate selective sampling, a learning model where the learner observes a sequence of i.i.d. unlabeled instances each time deciding whether to query the label of the current instance. We assume that labels are binary and stochastically related to instances via a linear probabilistic function whose coefficients are arbitrary and unknown. We then introduce a new selective sampling rule and show that its expected regret (with respect to the classifier knowing the underlying linear function and observing the label realization after each prediction) grows not much faster than the number of sampled labels. Furthermore, under additional assumptions on the true margin distribution, we prove that the number of sampled labels grows only logarithmically in the number of observed instances. Experiments carried out on a text categorization problem show that: (1) our selective sampling algorithm performs better than the Perceptron algorithm even when the latter is given the true label after each classification; (2) when allowed to observe the true label after each classification, the performance of our algorithm remains the same. Finally, we note that by expressing our selective sampling rule in dual variables we can learn nonlinear probabilistic functions via the kernel machinery.
机译:在本文中,我们研究了选择性抽样,这是一种学习模型,学习者可以在其中观察i.i.d序列。每次决定是否查询当前实例的标签时,所有未标记的实例。我们假设标签是二进制的,并且通过线性概率函数与实例随机相关,该线性概率函数的系数是任意的且未知。然后,我们引入了一种新的选择性采样规则,并表明它的预期后悔(相对于分类器了解基本线性函数并在每次预测后观察标签实现)的增长速度并不比采样标签的数量快得多。此外,在关于真实边距分布的其他假设下,我们证明了采样标签的数量仅在观察到的实例数量上呈对数增长。针对文本分类问题进行的实验表明:(1)即使在每次分类后为Perceptron算法提供了真实的标签,我们的选择性采样算法也比Perceptron算法具有更好的性能; (2)在每次分类后允许观察真实标签时,我们算法的性能保持不变。最后,我们注意到,通过在双变量中表达选择性抽样规则,我们可以通过内核机制学习非线性概率函数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号