首页> 外文会议>Workshop on noisy user-generated text >Identifying Linguistic Areas for Geolocation
【24h】

Identifying Linguistic Areas for Geolocation

机译:确定地理位置的语言区域

获取原文

摘要

Geolocating social media posts relies on the assumption that language carries sufficient geographic information. However, locations are usually given as continuous latitude/longitude tuples, so we first need to define discrete geographic regions that can serve as labels. Most studies use some form of clustering to dis-cretize the continuous coordinates (Han et al., 2016). However, the resulting regions do not always correspond to existing linguistic areas. Consequently, accuracy at 100 miles tends to be good, but degrades for finer-grained distinctions, when different linguistic regions get lumped together. We describe a new algorithm. Point-to-City (P2C), an iterative k-d tree-based method for clustering geographic coordinates and associating them with towns. We create three sets of labels at different levels of granularity, and compare performance of a state-of-the-art geolocation model trained and tested with P2C labels to one with regular k-d tree labels. Even though P2C results in substantially more labels than the baseline, model accuracy increases significantly over using traditional labels al the fine-grained level, while staying comparable at 100 miles. The results suggest that identifying meaningful linguistic areas is crucial for improving geolocation at a fine-grained level.
机译:对社交媒体帖子进行地理定位是基于这样的假设,即语言承载了足够的地理信息。但是,位置通常以连续的纬度/经度元组给出,因此我们首先需要定义可以用作标签的离散地理区域。大多数研究使用某种形式的聚类来离散化连续坐标(Han et al。,2016)。但是,结果区域并不总是对应于现有的语言区域。因此,在100英里处的精度往往会很好,但当不同的语言区域集中在一起时,精度会下降。我们描述了一种新算法。点对城市(P2C),一种基于k-d树的迭代方法,用于对地理坐标进行聚类并将其与城镇相关联。我们创建了三组不同粒度级别的标签,并比较了使用P2C标签训练和测试的最新地理位置模型与使用常规k-d树标签的地理位置模型的性能。尽管P2C产生的标签比基线多得多,但模型精度比使用传统标签的细粒度水平要大得多,而在100英里处仍保持可比性。结果表明,识别有意义的语言区域对于改善细粒度的地理位置至关重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号