Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages

机译：具有数百万个标签的多标签学习：推荐网页的广告商出价短语

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recommending phrases from web pages for advertisers to bid on against search engine queries is an important research problem with direct commercial impact. Most approaches have found it infeasible to determine the relevance of all possible queries to a given ad landing page and have focussed on making recommendations from a small set of phrases extracted (and expanded) from the page using NLP and ranking based techniques. In this paper, we eschew this paradigm, and demonstrate that it is possible to efficiently predict the relevant subset of queries from a large set of mon-etizable ones by posing the problem as a multi-label learning task with each query being represented by a separate label. We develop Multi-label Random Forests to tackle problems with millions of labels. Our proposed classifier has prediction costs that are logarithmic in the number of labels and can make predictions in a few milliseconds using 10 Gb of RAM. We demonstrate that it is possible to generate training data for our classifier automatically from click logs without any human annotation or intervention. We train our classifier on tens of millions of labels, features and training points in less than two days on a thousand node cluster. We develop a sparse semi-supervised multi-label learning formulation to deal with training set biases and noisy labels harvested automatically from the click logs. This formulation is used to infer a belief in the state of each label for each training ad and the random forest classifier is extended to train on these beliefs rather than the given labels. Experiments reveal significant gains over ranking and NLP based techniques on a large test set of 5 million ads using multiple metrics.

机译：推荐网页上的短语以供广告商针对搜索引擎查询竞标是一个具有直接商业影响的重要研究问题。大多数方法发现确定所有可能查询与给定广告目标页面的相关性是不可行的，并且集中于使用NLP和基于排名的技术从页面中提取（扩展）的一小部分短语中提出建议。在本文中，我们避开了这种范例，并证明通过将问题摆在一个多标签学习任务上（每个查询由一个表示）可以有效地从一大批可简化的查询集中有效地预测查询的相关子集。单独的标签。我们开发了多标签随机森林来解决数百万标签的问题。我们提出的分类器的预测成本与标签数量成对数，并且使用10 Gb RAM可以在几毫秒内做出预测。我们证明了可以从点击日志中自动为分类器生成训练数据，而无需任何人工注释或干预。在不到两天的时间里，我们在一千个节点集群上对分类器进行了数千万个标签，特征和训练点的训练。我们开发了一种稀疏的半监督多标签学习公式，以处理从点击日志中自动收集的训练集偏差和嘈杂标签。此公式用于为每个训练广告推断每个标签状态的信念，并且扩展随机森林分类器以针对这些信念（而不是给定的标签）进行训练。实验显示，在使用多种指标的500万个广告的大型测试集中，基于排名和基于NLP的技术获得了显着收益。

著录项

来源
《International conference on world wide web》|2013年|13-23|共11页
会议地点
作者
Rahul Agrawal; Archit Gupta; Yashoteja Prabhu; Manik Varma;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Multi-Label Learning; Semi-Supervised Learning; Random Forests; Large Scale Learning; Bid Phrase Recommendation;

机译：多标签学习;半监督学习;随机森林大规模学习;竞标词建议;

相似文献

外文文献
中文文献
专利

1. Multi-label learning with multi-label smoothing regularization for vehicle re-identification [J] . Hou Jinhui, Zeng Huanqiang, Cai Lei, Neurocomputing . 2019,第JUNa14期

机译：带有多标签平滑规则化的多标签学习，用于车辆重新识别
2. Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms [J] . Bassam Al-Salemi, Masri Ayob, Graham Kendall, Information Processing & Management . 2019,第1期

机译：多标签阿拉伯语文本分类：多标签学习算法的基准和基线比较
3. Multi-label learning with multi-label smoothing regularization for vehicle re-identification [J] . Hou Jinhui, Zeng Huanqiang, Cai Lei, Neurocomputing . 2019,第Juna14期

机译：具有多标签平滑正则化的多标签学习，用于车辆重新识别
4. Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages [C] . Rahul Agrawal, Archit Gupta, Yashoteja Prabhu, International conference on world wide web . 2013

机译：多标签学习用数百万标签：推荐网页的广告商出价短语
5. Leveraging Label Information in Representation Learning for Multi-Label Text Classification [D] . Wu, Jiayu 2019

机译：在表示学习中利用标签信息进行多标签文本分类
6. Enhancing reaction-based de novo design using a multi-label reaction class recommender [O] . Gian Marco Ghiandoni, Michael J. Bodkin, Beining Chen, -1

机译：使用多标签反应类别推荐器增强基于反应的从头设计
7. Evaluating multi-label classifiers and recommender systems in the financial service sector [O] . Matthias Bogaert, Justine Lootens, Dirk Van den Poel, 2019

机译：评估金融服务部门的多标签分类器和推荐人员

Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages

摘要

著录项

相似文献

相关主题

期刊订阅