首页> 外文会议>European Conference on Machine Learning(ECML 2007); 20070917-21; Warsaw(PL) >Learning to Classify Documents with Only a Small Positive Training Set
【24h】

Learning to Classify Documents with Only a Small Positive Training Set

机译:仅需进行少量积极培训即可学习对文档进行分类

获取原文
获取原文并翻译 | 示例

摘要

Many real-world classification applications fall into the class of positive and unlabeled (PU) learning problems. In many such applications, not only could the negative training examples be missing, the number of positive examples available for learning may also be fairly limited due to the impracticality of hand-labeling a large number of training examples. Current PU learning techniques have focused mostly on identifying reliable negative instances from the unlabeled set U. In this paper, we address the oft-overlooked PU learning problem when the number of training examples in the positive set P is small. We propose a novel technique LPLP (Learning from Probabilistically Labeled Positive examples) and apply the approach to classify product pages from commercial websites. The experimental results demonstrate that our approach outperforms existing methods significantly, even in the challenging cases where the positive examples in P and the hidden positive examples in U were not drawn from the same distribution.
机译:许多现实世界中的分类应用程序都属于正面和未标记(PU)学习问题。在许多这样的应用中,不仅会丢失负面的训练示例,而且由于手工标注大量训练示例的不切实际性,可用于学习的正面示例的数量也可能受到相当的限制。当前的PU学习技术主要集中在从未标记的集合U中确定可靠的否定实例。在本文中,当正集合P中的训练实例数量很少时,我们将解决经常被忽视的PU学习问题。我们提出了一种新技术LPLP(从概率标记的阳性示例中学习),并应用该方法对商业网站上的产品页面进行分类。实验结果表明,即使在挑战性案例中,P中的正例和U中的隐式正例并非来自同一分布,我们的方法也明显优于现有方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号