【24h】

FOCUSING WEB CRAWLS ON LOCATION-SPECIFIC CONTENT

机译:专注于特定于位置的内容的Web爬网

获取原文

摘要

Retrieving relevant data for location-sensitive keyword queries is a challenging task that has so far been addressed as a problem of automatically determining the geographical orientation of web searches. Unfortunately, identifying localizable queries is not sufficient per se for performing successful location-sensitive searches, unless there exists a geo-referenced index of data sources against which localizable queries are searched. In this paper, we propose a novel approach towards the automatic construction of a geo-referenced search engine index. Our approach relies on a geo-focused crawler that incorporates a structural parser and uses GeoWordNet as a knowledge base in order to automatically deduce the geo-spatial information that is latent in the pages' contents. Based on location-descriptive elements in the page URLs and anchor text, the crawler directs the pages to a location-sensitive downloader. This downloading module resolves the geographical references of the URL location elements and organizes them into indexable hierarchical structures. The location-aware URL hierarchies are linked to their respective pages, resulting into a georeferenced index against which location-sensitive queries can be answered.
机译:检索位置敏感关键字查询的相关数据是一个具有挑战性的任务,迄今已被解决是自动确定Web搜索的地理位方向的问题。遗憾的是,识别可定位查询本身不足以执行成功的位置敏感搜索,除非存在对搜索可定位查询的地理参考索引的地理参考索引。在本文中,我们提出了一种新颖的旨在自动构建地理参考搜索引擎指数的方法。我们的方法依赖于搭便的地理困境,其中包含一个结构解析器,并使用GeoWordnet作为知识库,以便自动推断出在页面内容中潜伏的地理空间信息。基于页面URL和锚文本中的位置描述性元素,爬网程序将页面指向位置敏感下载器。此下载模块解析了URL位置元素的地理引用,并将它们组织成可索引的分层结构。位置感知URL层次结构链接到其各自的页面,从而导致地理位置索引可以回答哪个位置敏感查询。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号