首页> 外文会议>International conference on computational linguistics >Geolocation Prediction in Social Media Data by Finding Location Indicative Words
【24h】

Geolocation Prediction in Social Media Data by Finding Location Indicative Words

机译:通过查找位置指示词在社交媒体数据中的地理位置预测

获取原文

摘要

Geolocation prediction is vital to geospatial applications like localised search and local event detection. Predominately, social media geolocation models are based on full text data, including common words with no geospatial dimension (e.g. today) and noisy strings (tmrw), potentially hampering prediction and leading to slower/more memory-intensive models. In this paper, we focus on finding location indicative words (LIWs) via feature selection, and establishing whether the reduced feature set boosts geolocation accuracy. Our results show that an information gain ratio-based approach surpasses other methods at LIW selection, outperforming state-of-the-art geolocation prediction methods by 10.6% in accuracy and reducing the mean and median of prediction error distance by 45km and 209km, respectively, on a public dataset. We further formulate notions of prediction confidence, and demonstrate that performance is even higher in cases where our model is more confident, striking a trade-off between accuracy and coverage. Finally, the identified LIWs reveal regional language differences, which could be potentially useful for lexicographers.
机译:地理位置预测对于诸如本地化搜索和本地事件检测之类的地理空间应用至关重要。社交媒体地理位置模型主要基于全文数据,包括没有地理空间维度的普通单词(例如今天)和嘈杂的字符串(tmrw),这可能会妨碍预测并导致速度更慢/内存占用更多​​的模型。在本文中,我们着重于通过特征选择来查找位置指示词(LIW),并确定简化的特征集是否可以提高地理位置精度。我们的结果表明,基于信息增益比率的方法在LIW选择方面优于其他方法,其精度比最新的地理位置预测方法高10.6%,并且将预测误差距离的平均值和中值分别降低了45km和209km。 ,在公共数据集上。我们进一步提出了预测置信度的概念,并证明了在我们的模型更具有信心的情况下,性能甚至更高,从而在准确性和覆盖率之间做出了权衡。最后,已识别的LIW揭示了区域语言差异,这可能对词典编纂者有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号