首页> 外文OA文献 >Estimating the class prior in positive and unlabeled data through decision tree induction
【2h】

Estimating the class prior in positive and unlabeled data through decision tree induction

机译:通过决策树归纳估计阳性和未标记数据中的类优先级

摘要

For tasks such as medical diagnosis and knowledge base completion, a classifier may only have access to positive and unlabeled examples, where the unlabeled data consists of both positive and negative examples. One way that enables learning from this type of data is knowing the true class prior. In this paper, we propose a simple yet effective method for estimating the class prior, by estimating the probability that a positive example is selected to be labeled. Our key insight is that subdomains of the data give a lower bound on this probability. This lower bound gets closer to the real probability as the ratio of labeled examples increases. Finding such subsets can naturally be done via top-down decision tree induction. Experiments show that our method makes estimates which are equivalently accurate as those of the state of the art methods, and is an order of magnitude faster.
机译:对于诸如医学诊断和知识库完成之类的任务,分类器只能访问阳性和未标记的示例,其中未标记的数据包括阳性和阴性的示例。能够从此类数据中学习的一种方法是事先了解真实的课程。在本文中,我们通过估计选择正例被标记的概率,提出了一种简单但有效的方法来估计课前先验。我们的主要见识在于,数据的子域在此概率上给出了下限。随着标记示例比例的增加,该下限越来越接近真实概率。查找此类子集自然可以通过自上而下的决策树归纳来完成。实验表明,我们的方法所做的估算与现有方​​法的估算相当,且速度快一个数量级。

著录项

  • 作者

    Bekker Jessa; Davis Jesse;

  • 作者单位
  • 年度 2018
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号