首页> 外文会议>Innovations in computational intelligence >URL-Based Relevance-Ranking Approach to Facilitate Domain-Specific Crawling and Searching
【24h】

URL-Based Relevance-Ranking Approach to Facilitate Domain-Specific Crawling and Searching

机译:基于URL的相关性排名方法,以促进特定于域的爬网和搜索

获取原文
获取原文并翻译 | 示例

摘要

The WWW is a vast repository of all the types of information known to mankind and thus is capable of serving the frequent varying needs of its users. Classifying and organizing the webpages according to their domain or topic will help the search engine in retrieving and returning a set of fairly relevant pages to the users. This classification is generally done on the basis of their underlying text or content. This paper brings in a novel approach that tries to predict the relevance of a webpage in a domain not by downloading its content but based on the web documents it is linked to. The approach offers advantages of efficiency in cost and performance as the most easily and the least expensive information available about a webpage is its uniform resource locator (URL) [1]. Since the URLs serve as the unique identifier, they are assumed to be an important source for the content of a web page, and therefore, the proposed approach associates the domain information with the web pages based on their URLs.
机译:WWW是人类已知的所有类型信息的广阔存储库,因此能够满足其用户频繁变化的需求。根据网页的领域或主题对网页进行分类和组织将有助于搜索引擎检索一组相当相关的页面并将其返回给用户。通常根据其基础文本或内容进行此分类。本文提出了一种新颖的方法,该方法尝试通过并非链接的内容而是基于链接到的Web文档来预测某个域中某个网页的相关性。该方法具有成本和性能方面的效率优势,因为有关网页的最容易,最便宜的信息是其统一资源定位符(URL)[1]。由于URL用作唯一标识符,因此假定它们是网页内容的重要来源,因此,所提出的方法基于URL将域信息与网页相关联。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号