首页> 外文期刊>電子情報通信学会技術研究報告. 情報論的学習理論と機械学習 >Learning from Positive and Unlabeled Data 2: Computationally Efficient Estimation of Class Priors
【24h】

Learning from Positive and Unlabeled Data 2: Computationally Efficient Estimation of Class Priors

机译:从积极的和未标记的数据中学习2:班级先验的计算有效估计

获取原文
获取原文并翻译 | 示例
       

摘要

We consider the problem of estimating the class prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, with additional samples coming only from the positive class, the class prior of the unlabeled dataset can be estimated correctly. Our key idea is to use properly penalized divergences for model fitting to cancel the error caused by the absence of negative samples. We further show that the use of the penalized Li-distance gives a computationally efficient algorithm with an analytic solution, and establish its uniform deviation bound and estimation error bound. Finally, we experimentally demonstrate the usefulness of the proposed method.
机译:我们考虑在无标签数据集中先估计类的问题。在附加标签数据集可用的假设下,可以通过将类数据分布的混合拟合到未标签数据分布来估计类先验。但是,实际上,这样的附加标记数据集通常不可用。在本文中,我们表明,使用仅来自阳性类别的其他样本,可以正确估计未标记数据集的类别优先级。我们的关键思想是对模型拟合使用适当的罚分,以消除由于缺少负样本而导致的误差。我们进一步表明,使用罚分李距离给出了一种具有解析解的计算有效算法,并建立了其均匀偏差范围和估计误差范围。最后,我们通过实验证明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号