首页> 外文会议>Twenty-First International Workshop on Database and Expert Systems Applications >Towards a Search System for the Web Exploiting Spatial Data of a Web Document
【24h】

Towards a Search System for the Web Exploiting Spatial Data of a Web Document

机译:面向Web的搜索系统,利用Web文档的空间数据

获取原文

摘要

In this paper, we describe our work in progress in the scope of information retrieval exploiting the spatial data extracted from web documents. We discuss problems of a search for web documents by geographic distance, where the geographic distance of a document is determined automatically using information extraction methods. We present here our approach of building a distributed search system, which deals with several problems of this area. Search by geographic distance is useful, for example if we are looking for the nearest restaurant, hotel or any other business near our location (reference point). Almost every company today presents its business on the Internet sharing business information along with contact information. There can be miscellaneous geographic information extracted from the contact information (but no only from it) and used to compute geographic distance of a document. Under a document's geographic distance, we understand the distance between a search reference point and a geographic location related to the document. In our approach, we chose postal addresses and GPS coordinates for spatial data extraction. The reference point can be dynamically changed and one document can be related to more than one geographic location. Geographic locations are automatically discovered in document's textual content. Document is then indexed by all its known geographic locations, so later when searching, the document can be found near different geographic locations to which it is related. Domain of the search is automatically built by crawling through linked web documents.
机译:在本文中,我们描述了利用从Web文档中提取的空间数据在信息检索范围内进行的工作。我们讨论了按地理距离搜索Web文档的问题,其中使用信息提取方法自动确定文档的地理距离。我们在这里介绍构建分布式搜索系统的方法,该方法解决了该领域的一些问题。按地理距离搜索很有用,例如,如果我们正在寻找离我们位置(参考点)最近的餐馆,酒店或任何其他公司。如今,几乎每家公司都在Internet上展示其业务,并共享业务信息和联系信息。可以从联系人信息中提取其他地理信息(但不仅限于此),并用于计算文档的地理距离。在文档的地理距离下,我们了解搜索参考点和与文档相关的地理位置之间的距离。在我们的方法中,我们选择邮政地址和GPS坐标进行空间数据提取。参考点可以动态更改,并且一个文档可以与一个以上的地理位置相关。地理位置是在文档的文本内容中自动发现的。然后,通过所有已知地理位置对文档进行索引,因此稍后在搜索时,可以在与文档相关的不同地理位置附近找到该文档。通过搜索链接的Web文档会自动构建搜索域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号