首页> 外文学位 >An investigation of several document classification algorithms leading to the design of an autonomous software agent for locating specific, relevant information on the World Wide Web.
【24h】

An investigation of several document classification algorithms leading to the design of an autonomous software agent for locating specific, relevant information on the World Wide Web.

机译:对几种文档分类算法的研究,导致设计了一种自治软件代理,用于在万维网上定位特定的相关信息。

获取原文
获取原文并翻译 | 示例

摘要

The goal of the research described in this thesis was to design an autonomous software agent that can locate specific, relevant information on the World Wide Web. The first chapter provides the motivation behind this project and a brief overview of the challenges associated with it. The next chapter presents the analysis which led to the development of a new, improved version of the computer program called ITRule. The improvements consist of a new algorithm for classifying documents that outperforms the previous one, significantly enhanced support for data exploration, i.e., the process of extracting information from raw data, and a new algorithm for quantizing numeric variables so they can be used by ITRule. The third part of this thesis compares the performances of three versions of ITRule, two versions of the Naive Bayes classifier, several neural networks, the decision tree algorithm called CART, and a linear support vector machine, in order to determine which one is best suited for selecting relevant web pages. An analysis of the test results shows that a new ITRule classification algorithm, based on cross validation combined with the J-measure, performs best. The fourth and final part of the thesis describes how some of these results were used in the design of a user friendly, autonomous software agent called Poirot that can help World Wide Web users stay up to date on new developments in topics of interest.
机译:本文所述研究的目的是设计一种自治软件代理,该代理可以在万维网上定位特定的相关信息。第一章提供了该项目的动机,并简要概述了与之相关的挑战。下一章介绍了分析,该分析导致开发了新的改进版本的计算机程序ITRule。改进包括新的文档分类算法(优于以前的算法),对数据探索的显着增强支持(即从原始数据提取信息的过程)以及量化数值变量以便ITRule可以使用的新算法。本文的第三部分比较了三个版本的ITRule,两个版本的朴素贝叶斯分类器,几个神经网络,决策树算法CART和线性支持向量机的性能,以确定哪一个最适合用于选择相关网页。对测试结果的分析表明,基于交叉验证和J-measure的新ITRule分类算法效果最佳。论文的第四部分也是最后一部分,描述了如何在名为Poirot的用户友好型自治软件代理的设计中使用这些结果中的一部分,该代理可以帮助World Wide Web用户及时了解感兴趣主题的最新发展。

著录项

  • 作者

    Lindal, John.;

  • 作者单位

    California Institute of Technology.;

  • 授予单位 California Institute of Technology.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2001
  • 页码 175 p.
  • 总页数 175
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:47:14

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号