URL-Based Relevance-Ranking Approach to Facilitate Domain-Specific Crawling and Searching

机译：基于URL的相关性排名方法，以促进特定于域的爬网和搜索

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The WWW is a vast repository of all the types of information known to mankind and thus is capable of serving the frequent varying needs of its users. Classifying and organizing the webpages according to their domain or topic will help the search engine in retrieving and returning a set of fairly relevant pages to the users. This classification is generally done on the basis of their underlying text or content. This paper brings in a novel approach that tries to predict the relevance of a webpage in a domain not by downloading its content but based on the web documents it is linked to. The approach offers advantages of efficiency in cost and performance as the most easily and the least expensive information available about a webpage is its uniform resource locator (URL) [1]. Since the URLs serve as the unique identifier, they are assumed to be an important source for the content of a web page, and therefore, the proposed approach associates the domain information with the web pages based on their URLs.

机译：WWW是人类已知的所有类型信息的广阔存储库，因此能够满足其用户频繁变化的需求。根据网页的领域或主题对网页进行分类和组织将有助于搜索引擎检索一组相当相关的页面并将其返回给用户。通常根据其基础文本或内容进行此分类。本文提出了一种新颖的方法，该方法尝试通过并非链接的内容而是基于链接到的Web文档来预测某个域中某个网页的相关性。该方法具有成本和性能方面的效率优势，因为有关网页的最容易，最便宜的信息是其统一资源定位符（URL）[1]。由于URL用作唯一标识符，因此假定它们是网页内容的重要来源，因此，所提出的方法基于URL将域信息与网页相关联。

著录项

来源
《Innovations in computational intelligence》|2016年|239-250|共12页
会议地点
作者
Sonali Gupta; Komal Kumar Bhatia;
展开▼
作者单位

Department of Computer Engineering, YMCA University of Science Technology, Faridabad, India;

Department of Computer Engineering, YMCA University of Science Technology, Faridabad, India;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
URL; Domain identification; Topic-specific Web page classification; Crawler; Search engine; Domain-specific Focused crawler; Hidden-web crawler;

机译：网址；域标识；特定主题的网页分类；履带搜索引擎;特定于领域的搜寻器；隐藏式网络爬虫;
入库时间 2022-08-26 14:02:31

相似文献

外文文献
中文文献
专利

1. Heuristic-based strategy for Phishing prediction: A survey of URL-based approach [J] . Revoredo da Silva Carlo Marcelo, Feitosa Eduardo Luzeiro, Garcia Vinicius Cardoso Computers & Security . 2020,第Jana期

机译：基于启发式的网络钓鱼预测策略：基于URL的方法的调查
2. A Domain-Specific Concept-Based Searching System [J] . Tru H. Cao, Mai T. H. Ta, Tung Q. Luong 電子情報通信学会技術研究報告. 人工知能と知識処理. Artificial Intelligence and Knowledge Based Processing . 2004,第488期

机译：基于特定领域概念的搜索系统
3. Crawling Strategies of Reverse Searching and Incremental Two-Level Site Prioritizing System. [J] . Chinmai Daka1, Julie Shabna S1 Research Journal of Pharmaceutical, Biological and Chemical Sciences . 2016,第4期

机译：反向搜索和增量式两级站点优先系统的爬行策略。
4. URL-Based Relevance-Ranking Approach to Facilitate Domain-Specific Crawling and Searching [C] . Sonali Gupta, Komal Kumar Bhatia International conference on recent developments in science, engineering and technology . 2018

机译：基于URL的相关性排名方法，以促进特定于域的爬行和搜索
5. A novel hybrid focused crawling algorithm to build domain-specific collections. [D] . Chen, Yuxin. 2007

机译：一种新颖的混合重点爬网算法，用于构建特定于域的集合。
6. N-Glycans on EGF domain-specific O-GlcNAc transferase (EOGT) facilitate EOGT maturation and peripheral endoplasmic reticulum localization [O] . Sayad Md. Didarul Alam, Yohei Tsukamoto, Mitsutaka Ogawa, 2020

机译：在EGF结构域特异性O-GlcNAc转移酶（EoGT）上的N-聚糖促进Eogt成熟和外周内质网本地化
7. DSphere: A Source-Centric Approach to Crawling, Indexing and Searching the World Wide Web [O] . Bhuvan Bamba, Ling Liu, James Caverlee, 2013

机译：Dsphere：以源为中心的方法来对万维网进行爬行，索引和搜索

URL-Based Relevance-Ranking Approach to Facilitate Domain-Specific Crawling and Searching

摘要

著录项

相似文献

相关主题

期刊订阅