首页> 中文期刊> 《计算机应用与软件》 >基于主题语义 URL 的信息搜索方法研究

基于主题语义 URL 的信息搜索方法研究

     

摘要

为提高主题网络爬虫的效率及收获率,提出一种基于主题语义 URL 的信息搜索方法。该方法将种子 URL 映射到主题树的主题结点上,以主题路径上的主题文本扩充种子 URL 的语义,引导爬虫高效准确地抓取主题页面,并利用链接重要度与页面重要度因子在抓取过程中自动选育新的 URL 优良种子。重点阐述上述搜索方法的原理及其在系统中的实现。实验结果表明,该搜索方法能有效改善网络爬虫的搜索效率及收获率,且种子链接的选育性能良好。%This paper presents a topic semantics URL-based information search method for improving the efficiency and harvest ratio of topic networks crawler.The method maps the seed URL onto the topic nodes of topic tree,and expands the semantics of seed URL by using the topic text on topic path as well as guides the crawler to efficiently and precisely crawl the topic pages.Furthermore,it makes use of the factors of link importance and page importance to automatically select and breed new URL seeds during the crawling process.The paper emphatically elucidates the principle of the search method above mentioned and its realisation in the system.Experimental results demonstrate that this method can effectively improve the search efficiency and harvest ratio of network crawlers,and the selection and breeding performance of seeds link is excellent as well.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号