为了能够处理网页文档中的地理信息,提出了一个新颖的自动提取文本地理位置的方法.该方法通过一个三阶段的地理语义处理过程,实现了文本的多尺度地理标注.首先,在地理知识库的支持下,识别文本中的地名;其次,基于地理的和非地理的语义消除地名歧义并且应用证据理论合成排歧证据;最后,基于相关认知理论构建文本的地理参照树,再根据实体间的语义关系计算得到焦点地理实体,从而确定文本的地理位置.以上算法在地理信息检索原型系统GeoSeracher中得到实现,评估结果表明其具有较高的准确度.%To process geographic information in Web pages, this paper presents a novel method for extracting the geographic scopes of documents. It assigns the multi-scale geographic scope to a document through a three-stage process for dealing with geographic semantics. Firstly, the toponyms in a document are recognized under the support of the geographic knowledge base. Secondly, the ambiguous toponyms are disambiguated based on geographic and non-geographic semantics, and the evidences for disambiguation are combined by the evidence theory. Lastly, a geo-referenced tree is constructed based on a cognitive theory and the geographic focuses are obtained according to sematic relationships. The geographic location of a document is therefore determined. The above method was implemented in GeoSearcher, a prototype system for geographic information retrieval. The evaluation results show that the proposed method can reach the higher accuracy.
展开▼