首页> 外文会议>International Conference on Algorithmic Learning Theory >Extracting Information from the Web for Concept Learning and Collaborative Filtering William
【24h】

Extracting Information from the Web for Concept Learning and Collaborative Filtering William

机译:从Web提取信息概念学习和协作过滤威廉

获取原文
获取外文期刊封面目录资料

摘要

Previous work on extracting information from the web generally makes few assumptions about how the extracted information will be used. As a consequence, the goal of web-based extraction systems is usually taken to be the creation of high-quality, noise-free data with clear semantics. This is a difficult problem which cannot be completely automated. Here we consider instead the problem of extracting web data for certain machine learning systems: specifically, collaborative filtering (CF) and concept learning (CL) systems. CF and CL systems are highly tolerant of noisy input, and hence much simpler extraction systems can be used in this context. For CL, we will describe a simple method that uses a given set of web pages to construct new features, which reduce the error rate of learned classifiers in a wide variety of situations. For CF, we will describe a simple method that automatically collects useful information from the web without any human intervention. The collected information, represented as “pseudo-users”, can be used to “jumpstart” a CF system when the user base is small (or even absent).
机译:以前关于从Web中提取信息的工作通常几乎没有关于如何使用提取信息的假设。因此,基于Web的提取系统的目标通常被认为是具有清晰语义的高质量无噪声数据的创建。这是一个无法完全自动化的难题。在这里,我们考虑提取某些机器学习系统的Web数据的问题:具体地,协同过滤(CF)和概念学习(CL)系统。 CF和CL系统高度容忍噪声输入,因此在此上下文中可以使用更简单的提取系统。对于CL,我们将描述一种使用给定一组网页来构造新功能的简单方法,这减少了各种情况下学习分类器的错误率。对于CF,我们将描述一个简单的方法,可以在没有任何人为干预的情况下自动收集来自网络的有用信息。收集的信息,表示为“伪用户”,当用户群较小(甚至不存在)时,可以使用“伪用户”的“跳转”CF系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号