首页> 外文期刊>Sustainability >A Focused Crawler for Borderlands Situation Information with Geographical Properties of Place Names
【24h】

A Focused Crawler for Borderlands Situation Information with Geographical Properties of Place Names

机译:具有地名地理属性的无国界情况信息的重点爬虫

获取原文
           

摘要

Place name is an important ingredient of borderlands situation information and plays a significant role in collecting them from the Internet with focused crawlers. However, current focused crawlers treat place name in the same way as any other common keyword, which has no geographical properties. This may reduce the effectiveness of focused crawlers. To solve the problem, this paper firstly discusses the importance of place name in focused crawlers in terms of location and spatial relation, and, then, proposes the two-tuple-based topic representation method to express place name and common keyword, respectively. Afterwards, spatial relations between place names are introduced to calculate the relevance of given topics and webpages, which can make the calculation process more accurately. On the basis of the above, a focused crawler prototype for borderlands situation information collection is designed and implemented. The crawling speed and F-Score are adopted to evaluate its efficiency and effectiveness. Experimental results indicate that the efficiency of our proposed focused crawler is consistent with the polite access interval and it could meet the daily demand of borderlands situation information collection. Additionally, the F-Score value of our proposed focused crawler increases by around 7%, which means that our proposed focused crawler is more effective than the traditional best-first focused crawler.
机译:地名是边境状况信息的重要组成部分,并且在通过集中的爬虫从Internet上收集地名方面发挥着重要作用。但是,当前关注的搜寻器以与任何其他通用关键字相同的方式来对待地名,而没有地理属性。这可能会降低集中爬网程序的效率。为了解决该问题,本文首先从位置和空间关系的角度探讨了重点爬虫中地名的重要性,然后提出了一种基于二元组的主题表示方法,分别表示地名和常用关键词。之后,引入地名之间的空间关系来计算给定主题和网页的相关性,这可以使计算过程更加准确。在此基础上,设计并实现了一种针对边疆情况信息收集的重点履带原型。采用爬网速度和F-Score评估其效率和有效性。实验结果表明,本文提出的重点爬虫的效率与礼貌取用间隔相一致,可以满足边疆情况信息采集的日常需求。此外,我们提出的集中式爬虫的F-Score值增加了7%左右,这意味着我们提出的集中式爬虫比传统的最佳优先集中式爬虫更有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号