Identifying Linguistic Areas for Geolocation

机译：确定地理位置的语言区域

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Geolocating social media posts relies on the assumption that language carries sufficient geographic information. However, locations are usually given as continuous latitude/longitude tuples, so we first need to define discrete geographic regions that can serve as labels. Most studies use some form of clustering to dis-cretize the continuous coordinates (Han et al., 2016). However, the resulting regions do not always correspond to existing linguistic areas. Consequently, accuracy at 100 miles tends to be good, but degrades for finer-grained distinctions, when different linguistic regions get lumped together. We describe a new algorithm. Point-to-City (P2C), an iterative k-d tree-based method for clustering geographic coordinates and associating them with towns. We create three sets of labels at different levels of granularity, and compare performance of a state-of-the-art geolocation model trained and tested with P2C labels to one with regular k-d tree labels. Even though P2C results in substantially more labels than the baseline, model accuracy increases significantly over using traditional labels al the fine-grained level, while staying comparable at 100 miles. The results suggest that identifying meaningful linguistic areas is crucial for improving geolocation at a fine-grained level.

机译：对社交媒体帖子进行地理定位是基于这样的假设，即语言承载了足够的地理信息。但是，位置通常以连续的纬度/经度元组给出，因此我们首先需要定义可以用作标签的离散地理区域。大多数研究使用某种形式的聚类来离散化连续坐标（Han et al。，2016）。但是，结果区域并不总是对应于现有的语言区域。因此，在100英里处的精度往往会很好，但当不同的语言区域集中在一起时，精度会下降。我们描述了一种新算法。点对城市（P2C），一种基于k-d树的迭代方法，用于对地理坐标进行聚类并将其与城镇相关联。我们创建了三组不同粒度级别的标签，并比较了使用P2C标签训练和测试的最新地理位置模型与使用常规k-d树标签的地理位置模型的性能。尽管P2C产生的标签比基线多得多，但模型精度比使用传统标签的细粒度水平要大得多，而在100英里处仍保持可比性。结果表明，识别有意义的语言区域对于改善细粒度的地理位置至关重要。

著录项

来源
《Workshop on noisy user-generated text》|2019年|231-236|共6页
会议地点
作者
Tommaso Fornaciari; Dirk Hovy;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Geolocation, geolocation,geolocation [J] . Satellite Evolution Asia . 2009,第4期

机译：地理位置，地理位置，地理位置
2. Forensic pollen geolocation techniques used to identify the origin of boll weevil re-infestation [J] . Gretchen D. Jonesa* Grana . 2012,第3期

机译：鉴定花粉象鼻虫再次侵袭起源的法医花粉地理定位技术
3. Forensic pollen geolocation techniques used to identify the origin of boll weevil re-infestation [J] . GRETCHEN D. JONES Grana: An International Journal of Palynology and Aerobiology . 2012,第3期

机译：鉴定花粉象鼻虫再次侵袭起源的法医花粉地理定位技术
4. Identifying Linguistic Areas for Geolocation [C] . Tommaso Fornaciari, Dirk Hovy Workshop on noisy user-generated text . 2019

机译：识别地理定位的语言区域
5. Who is you? Identifying "you" in second-person narratives: A systemic functional linguistics analysis. [D] . Kittrell, Davina. 2013

机译：你是谁？在第二人称叙述中识别“您”：系统功能语言学分析。
6. Unraveling City-Specific Microbial Signatures and Identifying Sample Origins for the Data From CAMDA 2020 Metagenomic Geolocation Challenge [O] . Runzhi Zhang, Dorothy Ellis, Alejandro R. Walker, 2021

机译：从Camda 2020 Metagenomic Geolocation Challenge揭示特定城市特异性微生物签名和识别样本起源
7. Identifying Linguistic Areas for Geolocation [O] . Tommaso Fornaciari, Dirk Hovy 2019

机译：识别地理定位的语言区域
8. Identifying Universal Linguistic Features Associated with Veracity and Deception. [R] . Matsumoto, D., Hwang, H. C. 2015

机译：识别与准确性和欺骗相关的普遍语言特征。

Identifying Linguistic Areas for Geolocation

摘要

著录项

相似文献

相关主题

期刊订阅