首页> 外文学位 >Web crawler indexing: An approach by clustering.
【24h】

Web crawler indexing: An approach by clustering.

机译:Web爬网程序索引编制:一种通过聚类的方法。

获取原文
获取原文并翻译 | 示例

摘要

Data Mining is a class of database applications that looks for hidden patterns in a group of data that can be used for future behavior. Knowledge discovery in databases (KDD) is a process of extracting unknown, potential information from data and thus uses the raw results of data mining for transforming it to understandable information. With the growth of online data on web, the opportunity and necessity to implement data mining techniques for effective web information retrieval has arisen. Web crawlers, also known as agents, robots or spiders are programs that continuously work behind the scene, having the essential role of downloading information from the web and maintaining an index of the downloaded pages.;With the enormous growth of web sites/pages, the problem of indexing poses to be a big bottleneck for effective querying and searching information. The thesis topic outlines the algorithm and techniques required to build an efficient web crawler, named TechSpider which implements hierarchical clustering method as a step towards indexing the downloaded information. This improves the efficiency of searching and querying the index for a particular keyword. The thesis focuses on the algorithm implementing the hierarchical clustering based on keywords.
机译:数据挖掘是一类数据库应用程序,它在一组数据中寻找可用于将来行为的隐藏模式。数据库中的知识发现(KDD)是从数据中提取未知的潜在信息的过程,因此使用数据挖掘的原始结果将其转换为可理解的信息。随着网络上在线数据的增长,已经出现了实施数据挖掘技术以进行有效的Web信息检索的机会和必要性。网络爬虫,也称为代理,机器人或蜘蛛,是在后台不断运行的程序,其主要作用是从网络上下载信息并维护下载页面的索引。随着网站/页面的巨大增长,索引问题成为有效查询和搜索信息的一大瓶颈。论文主题概述了构建名为TechSpider的高效Web爬网程序所需的算法和技术,该爬网程序实现了层次化的聚类方法,作为索引下载信息的步骤。这提高了搜索和查询特定关键字的索引的效率。本文重点研究了基于关键词的层次聚类算法。

著录项

  • 作者

    Menon, Dhanya C.;

  • 作者单位

    University of Nevada, Reno.;

  • 授予单位 University of Nevada, Reno.;
  • 学科 Computer science.;Information science.
  • 学位 M.S.
  • 年度 2004
  • 页码 76 p.
  • 总页数 76
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号