Focused Crawling Using Navigational Rank

机译：重点爬行使用导航等级

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The goal of focused crawling is to use limited resources to effectively discover web pages related to a specific topic rather than downloading all accessible web documents. The major challenge in focused crawling is how to effectively determine each hyperlink's capability of leading to target pages. To compute this capability, we1 present a novel approach, called Navigational Rank (NR). In general, NR is a kind of two-step and two-direction credit propagation approach. Compared to existing methods, NR mainly has three advantages. First, NR is dynamically updated during the crawling progress, which can adapt to different website structures very well. Second, when the crawling seed is far away from the target pages, and the target pages only constitute a small portion of the whole website, NR shows a significant performance advantage. Third, NR computes each link's capability of leading to target pages by considering both the target and non-target pages it leads to. This global knowledge causes a better performance than only using target pages. We have performed extensive experiments for performance evaluation of the proposed approach using two groups of large-scale, real-world datasets from two different domains. The experimental results show that our approach is domain-independent and significantly outperforms the state-of-arts.

机译：聚焦爬网的目标是使用有限的资源来有效地发现与特定主题相关的网页而不是下载所有可访问的Web文档。聚焦爬网中的主要挑战是如何有效地确定每个超链接的传导才能导致目标页面的能力。为了计算这种能力，We1提出了一种新的方法，称为导航等级（NR）。通常，NR是一种两步和双向信用传播方法。与现有方法相比，NR主要有三个优点。首先，在爬行进度期间，NR动态更新，这可以很好地适应不同的网站结构。其次，当爬行的种子远离目标页面时，目标网页只构成整个网站的一小部分，NR表示显着的性能优势。第三，NR通过考虑它导致的目标和非目标页面来计算每个链路的通向目标页面的能力。这种全局知识导致比仅使用目标页面更好的性能。我们已经对使用来自两个不同域的两组大型现实世界数据集进行了广泛的实验，以便使用两组大型现实世界数据集。实验结果表明，我们的方法是独立的，明显优于最先进的。

著录项

来源
《ACM conference on information and knowledge management》|2010年||共4页
会议地点
作者
Shicong Feng; Li Zhang; Yuhong Xiong; Conglei Yao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
focused crawling; navigational rank; personalized pagerank;

机译：重点爬行;导航秩;个性化PageRank;

相似文献

外文文献
中文文献
专利

1. Keyword weight optimization using gradient strategies in event focused web crawling [J] . Rajiv S., Navaneethan C. Pattern recognition letters . 2021,第Feba期

机译：关键词权重优化在活动中使用渐变策略的重点策略
2. Emotional attitudes towards procrastination in people: A large-scale sentiment-focused crawling analysis [J] . Chen Zhiyi, Zhang Rong, Xu Ting, Computers in Human Behavior . 2020,第Sepa期

机译：对人们拖延的情感态度：一个大规模的情绪集中爬行分析
3. FOCUSED WEB CRAWLING FOR HIGH PERFORMANCE SEARCH ENGINES: ISSUES, TECHNIQUES AND SYSTEMS [J] . SUSHIL KUMAR, NARESH CHAUHAN International journal of computational intelligence theory and practice . 2020,第1期

机译：专注于高性能搜索引擎的Web爬网：问题，技术和系统
4. Focused Crawling Using Navigational Rank [C] . Shicong Feng, Li Zhang, Yuhong Xiong, CIKM 10;ACM conference on information and knowledge management . 2011

机译：使用导航等级集中抓取
5. A novel hybrid focused crawling algorithm to build domain-specific collections. [D] . Chen, Yuxin. 2007

机译：一种新颖的混合重点爬网算法，用于构建特定于域的集合。
6. Domain adaptation of statistical machine translation with domain-focused web crawling [O] . Pavel Pecina, Antonio Toral, Vassilis Papavassiliou, -1

机译：统计机器翻译的领域适应和以领域为中心的网络爬网
7. Focused crawling from the basic approach to context aware notification architecture [O] . Venugopal Boppana, Sandhya P 2019

机译：从基本方法爬行到上下文意识通知架构
8. Focused Crawling of the Deep Web Using Service Class Descriptions [R] . Rocco, D., Liu, L., Critchlow, T. 2005

机译：使用服务类描述重点对Deep Web进行爬网

Focused Crawling Using Navigational Rank

摘要

著录项

相似文献

相关主题

期刊订阅