首页> 外文学位 >Web-based classification using machine learning approaches.
【24h】

Web-based classification using machine learning approaches.

机译:使用机器学习方法的基于Web的分类。

获取原文
获取原文并翻译 | 示例

摘要

This thesis describes an application to automate web page classification based on the Yahoo hierarchy. Machine learning approaches developed for learning on text data are used here on the hierarchical classification structure. The high number of features is reduced by taking into account the hierarchical structure and using feature selection based on the method known from IR (information retrieval). Documents are represented as feature-vectors that include the standard “bag of words” model as commonly used when learning on text data. Based on the hierarchical structure the problem is divided into different web sites, each representing one document on the categories included in the Yahoo hierarchy. The result of learning is a set of rules, each used to match the probability that a new example is a member of the corresponding category. Our classifier uses a voting technique to choose the category prediction that receives the highest combination score over each new page with n set of rules that match to the category to classify and predict that category. Thus we use only the most accurate of all applicable rules to classify a new page. (Abstract shortened by UMI.)
机译:本文介绍了一种基于Yahoo层次结构自动执行网页分类的应用程序。为学习文本数据而开发的机器学习方法在此处用于分层分类结构。通过考虑层次结构并使用基于IR(信息检索)已知方法的特征选择来减少大量特征。文档表示为特征向量,其中包括学习文本数据时常用的标准“单词袋”模型。根据层次结构,该问题分为不同的网站,每个网站代表Yahoo层次结构中所包含类别的一个文档。学习的结果是一组规则,每个规则用于匹配新示例是相应类别的成员的概率。我们的分类器使用投票技术来选择类别预测,该预测在每个新页面上获得最高组合得分,并具有与该类别相匹配的n组规则以对该类别进行分类和预测。因此,我们仅使用所有适用规则中最准确的规则对新页面进行分类。 (摘要由UMI缩短。)

著录项

  • 作者

    Huang, Yanhui.;

  • 作者单位

    The University of Regina (Canada).;

  • 授予单位 The University of Regina (Canada).;
  • 学科 Information Science.; Artificial Intelligence.
  • 学位 M.Sc.
  • 年度 2002
  • 页码 102 p.
  • 总页数 102
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 信息与知识传播;人工智能理论;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号