首页> 外文会议>IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies >A POI Categorization by Composition of Onomastic and Contextual Information
【24h】

A POI Categorization by Composition of Onomastic and Contextual Information

机译:由泛色和上下文信息的组成的POI分类

获取原文

摘要

Point of interest (POI) categorization is the task of finding of categories of POIs within a document. Because the documents that possess POIs have clue words for identifying POI categories, the task can be solved as document classification. However, this approach misses two crucial factors for identifying the category of a POI. First, the approach pays no attention to onomastic information, even though POI names reveal much categorical information in many cases. Second, the approach ignores the fact that most clue words for identifying a POI category are located near the POI name. This paper proposes a novel method that incorporates both onomastic and local contextual information in POI categorization. The proposed method uses support vector machines (SVMs) to categorize POIs. In order to utilize the onomastic information of POIs, The proposed method adopts the string kernel that manages variations of the POI names efficiently at the character level. The method also proposes a Gaussian weighting to content words in a document. By setting the mean of a Gaussian weighting at the position of a POI name, the method imposes higher weights to the words near the POI name and lower weights to the words far from the name. Then, these two types of information are combined by a composite kernel of the string kernel and a linear kernel with the Gaussian weighting. A series of experiments prove that SVMs with the combined information outperforms those with single information.
机译:兴趣点(POI)分类是在文件中查找POI类别的任务。由于具有POI的文档具有用于识别POI类别的线索词,因此可以将任务解决为文档分类。然而,这种方法错过了识别POI类别的两个关键因素。首先,即使POI名称在许多情况下揭示了许多分类信息,该方法也不关注串通信息。其次,该方法忽略了用于识别POI类别的大多数线索词位于POI名称附近。本文提出了一种新的方法,它包含了POI分类中的泛色和局部上下文信息。该方法使用支持向量机(SVM)来分类POI。为了利用POI的泛骨信息,所提出的方法采用字符串内核,在字符级别管理POI名称的变化。该方法还提出了一个高斯加权对文档中的内容词。通过在POI名称的位置设置高斯加权的平均值,该方法将更高的权重与Poi名称附近的单词施加到远离名称的单词。然后,这两种类型的信息由字符串内核的复合内核和具有高斯加权的线性内核组合。一系列实验证明了具有组合信息的SVM优于单一信息的信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号