首页> 外文会议>Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining(PAKDD 2005); 20050518-20; Hanoi(VN) >Collecting Topic-Related Web Pages for Link Structure Analysis by Using a Potential Hub and Authority First Approach
【24h】

Collecting Topic-Related Web Pages for Link Structure Analysis by Using a Potential Hub and Authority First Approach

机译:使用潜在的中心和授权优先方法收集与主题相关的网页以进行链接结构分析

获取原文
获取原文并翻译 | 示例

摘要

Constructing a base set consisting of topic-related web pages is a preliminary step for those web mining algorithms which use the link structure analysis technique based on HITS. However, except checking the anchor text of links and the content of pages, there has been few of research addressing other possibilities to improve topic relevance while collecting the base set. In this paper, we propose a potential hub and authority first (PHA-first) approach utilizing the concept of hub and authority to filter web pages. We investigate the satisfaction of dozens of users about the pages recommended by our method and HITS on different topics. The results indicate that our method is superior to HITS in most cases. In addition, we also evaluate the recall and precision measures of our method. The results show that our method is with relative high precision and low recall for all topics.
机译:对于那些使用基于HITS的链接结构分析技术的Web挖掘算法,构建由主题相关网页组成的基础集是第一步。但是,除了检查链接的锚文本和页面内容外,很少有研究针对其他可能性,以在收集基本集的同时提高主题相关性。在本文中,我们提出了一种潜在的中心和权限优先(PHA优先)方法,该方法利用中心和权限的概念来过滤网页。我们调查了数十名用户对我们的方法和HITS在不同主题上推荐的页面的满意度。结果表明,在大多数情况下,我们的方法优于HITS。此外,我们还评估了我们方法的召回率和精确度。结果表明,我们的方法对所有主题都具有较高的精度和较低的查全率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号