Learning to Classify Documents with Only a Small Positive Training Set

机译：仅需进行少量积极培训即可学习对文档进行分类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many real-world classification applications fall into the class of positive and unlabeled (PU) learning problems. In many such applications, not only could the negative training examples be missing, the number of positive examples available for learning may also be fairly limited due to the impracticality of hand-labeling a large number of training examples. Current PU learning techniques have focused mostly on identifying reliable negative instances from the unlabeled set U. In this paper, we address the oft-overlooked PU learning problem when the number of training examples in the positive set P is small. We propose a novel technique LPLP (Learning from Probabilistically Labeled Positive examples) and apply the approach to classify product pages from commercial websites. The experimental results demonstrate that our approach outperforms existing methods significantly, even in the challenging cases where the positive examples in P and the hidden positive examples in U were not drawn from the same distribution.

机译：许多现实世界中的分类应用程序都属于正面和未标记（PU）学习问题。在许多这样的应用中，不仅会丢失负面的训练示例，而且由于手工标注大量训练示例的不切实际性，可用于学习的正面示例的数量也可能受到相当的限制。当前的PU学习技术主要集中在从未标记的集合U中确定可靠的否定实例。在本文中，当正集合P中的训练实例数量很少时，我们将解决经常被忽视的PU学习问题。我们提出了一种新技术LPLP（从概率标记的阳性示例中学习），并应用该方法对商业网站上的产品页面进行分类。实验结果表明，即使在挑战性案例中，P中的正例和U中的隐式正例并非来自同一分布，我们的方法也明显优于现有方法。

著录项

来源
《European Conference on Machine Learning(ECML 2007); 20070917-21; Warsaw(PL)》|2007年|P.201-213|共13页
会议地点 Warsaw(PL)
作者
Xiao-Li Li; Bing Liu; See-Kiong Ng;
展开▼
作者单位

Institute for Infocomm Research, Heng Mui Keng Terrace, 119613, Singapore;

Department of Computer Science, University of Illinois at Chicago, IL 60607-7053;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词
入库时间 2022-08-26 14:09:17

相似文献

外文文献
中文文献
专利

1. Learning from Positive and Unlabeled Data 1: Classifier Training and Theoretical Analysis [J] . Marthinus Christoffel DU PLESSIS, Gang NIU, Masashi SUGIYAMA 電子情報通信学会技術研究報告. 情報論的学習理論と機械学習 . 2014,第306期

机译：从积极的和未标记的数据中学习1：分类器训练和理论分析
2. Regularising LSTM classifier by transfer learning for detecting misogynistic tweets with small training set [J] . Abul Bashar Md, Nayak Richi, Suzor Nicolas Knowledge and information systems . 2020,第10期

机译：通过传输学习来定律学习来检测小型训练集的误解推文进行正规化LSTM分类器
3. Learning Reliable Classifiers From Small or Incomplete Data Sets: The Naive Credal Classifier 2 [J] . Corani Giorgio, Zaffalon Marco Journal of machine learning research . 2008,第Apr期

机译：从小的或不完整的数据集中学习可靠的分类器：天真的Credal分类器2
4. Learning to Classify Documents with Only a Small Positive Training Set [C] . Xiao-Li Li, Bing Liu, See-Kiong Ng European Conference on Machine Learning . 2007

机译：学习只有一个小型训练集分类文件
5. Machine Learning for Classifying Malware in Closed-set and Open-set Scenarios [D] . Hassen, Mehadi Seid. 2018

机译：在封闭场景和开放场景中对恶意软件进行分类的机器学习
6. The effect of training set on the classification of honey bee gut microbiota using the Naïve Bayesian Classifier [O] . Irene LG Newton, Guus Roeselers 2012

机译：训练集对朴素贝叶斯分类器对蜜蜂肠道菌群分类的影响
7. Learning to Classify Documents with Only a Small Positive Training Set [O] . Xiao-li Li, Bing Liu, See-kiong Ng 2008

机译：只需少量积极培训即可学习对文档进行分类

Learning to Classify Documents with Only a Small Positive Training Set

摘要

著录项

相似文献

相关主题

期刊订阅