首页> 外文会议>International Symposium on Intelligent Data Analysis >Automatic POI Matching Using an Outlier Detection Based Approach
【24h】

Automatic POI Matching Using an Outlier Detection Based Approach

机译:使用基于异常检测的方法自动POI匹配

获取原文

摘要

Points of Interest (POI) are widely used in many applications nowadays mainly due to the increasing amount of related data available online, notably from volunteered geographic information (VGI) sources. Being able to connect these data from different sources is useful for many things like validating, correcting and also removing duplicated data in a database. However, there is no standard way to identify the same POIs across different sources and doing it manually could be very expensive. Therefore, automatic POI matching has been an attractive research topic. In our work, we propose a novel data-driven machine learning approach based on an outlier detection algorithm to match POIs automatically. Surprisingly, works that have been presented so far do not use data-driven machine learning approaches. The reason for this might be that such approaches need a training dataset to be constructed by manually matching some POIs. To mitigate this, we have taken advantage of the Cross-walk API, available at the time we started our project, which allowed us to retrieve already matched POI data from different sources in US territory. We trained and tested our model with a dataset containing Factual, Facebook and Foursquare POIs from New York City and were able to successfully apply it to another dataset of Facebook and Foursquare POIs from Porto, Portugal, finding matches with an accuracy around 95%. These are encouraging results that confirm our approach as an effective way to address the problem of automatically matching POIs. They also show that such a model can be trained with data available from multiple sources and be applied to other datasets with different locations from those used in training. Furthermore, as a data-driven machine learning approach, the model can be continuously improved by adding new validated data to its training dataset.
机译:兴趣点(POI)现在广泛应用于许多应用,主要是由于在线提供的相关数据量增加,特别是来自志愿地理信息(VGI)来源。能够从不同来源连接这些数据对于许多验证,纠正和删除数据库中的重复数据是有用的。但是,没有标准的方法来识别不同来源的相同POI,并且手动执行它可能非常昂贵。因此,自动POI匹配是一个有吸引力的研究主题。在我们的工作中,我们提出了一种基于异常检测算法的新型数据驱动的机器学习方法来自动匹配POI。令人惊讶的是,到目前为止所呈现的作品不使用数据驱动的机器学习方法。这可能是这种方法需要通过手动匹配一些POI来构建训练数据集。为了缓解这一点,我们利用了跨漫步的API,当时我们启动了我们的项目,允许我们从美国领域的不同来源中检索已经匹配的POI数据。我们使用来自纽约市的数据集进行培训并测试了我们的模型,该模型来自纽约市的Facebook和Foursquare Pois,并且能够从Porto,葡萄牙的另一个Facebook和Foursquare Pois上运用它,找到匹配的比赛,精度约为95%。这些是令人鼓舞的结果,确认我们的方法是解决自动匹配POIS问题的有效方法。他们还表明,可以使用多个来源提供的数据培训此类模型,并应用于具有培训中使用的其他地点的其他数据集。此外,作为数据驱动的机器学习方法,可以通过将新的验证数据添加到其训练数据集来持续提高模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号