【24h】

Applying Co-training to Clickthrough Data for Search Engine Adaptation

机译:将协同训练应用于点击数据以适应搜索引擎

获取原文
获取原文并翻译 | 示例

摘要

The information on the World Wide Web is growing without bound. Users may have very diversified preferences in the pages they target through a search engine. It is therefore a challenging task to adapt a search engine to suit the needs of a particular community of users who share similar interests. In this paper, we propose a new algorithm, Ranking SVM in a Co-training Framework (RSCF). Essentially, the RSCF algorithm takes the clickthrough data containing the items in the search result that have been clicked on by a user as an input, and generates adaptive rankers as an output. By analyzing the click-through data, RSCF first categorizes the data as the labelled data set, which contains the items that have been scanned already, and the unla-belled data set, which contains the items that have not yet been scanned. The labelled data is then augmented with unlabelled data to obtain a larger data set for training the rankers. We demonstrate that the RSCF algorithm produces better ranking results than the standard Ranking SVM algorithm. Based on RSCF we develop a metasearch engine that comprises MSNSearch, Wisenut, and Overture, and carry out an online experiment to show that our metasearch engine outperforms Google.
机译:万维网上的信息正在无限增长。用户在通过搜索引擎定位的网页中可能会有非常不同的偏好。因此,使搜索引擎适应满足共同兴趣的特定用户社区的需求是一项艰巨的任务。在本文中,我们提出了一种新算法,在协同训练框架(RSCF)中对SVM进行排名。本质上,RSCF算法将包含用户已单击的搜索结果中包含项的点击数据作为输入,并生成自适应排名作为输出。通过分析点击数据,RSCF首先将数据分类为标记的数据集(包含已扫描的项目)和未分类的数据集(包含尚未扫描的项目)。然后用未标记的数据扩充标记的数据,以获得更大的数据集,用于训练排名者。我们证明,RSCF算法比标准排名SVM算法产生更好的排名结果。基于RSCF,我们开发了一个包含MSNSearch,Wisenut和Overture的元搜索引擎,并进行了在线实验,证明我们的元搜索引擎胜过Google。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号