首页> 外文会议>International AAAI Conference on Web and Social Media >Freshman or Fresher? Quantifying the Geographic Variation of Language in Online Social Media
【24h】

Freshman or Fresher? Quantifying the Geographic Variation of Language in Online Social Media

机译:新生或更新鲜?量化在线社交媒体中语言的地理变化

获取原文

摘要

In this paper we present a new computational technique to detect and analyze statistically significant geographic variation in language. While previous approaches have primarily focused on lexical variation between regions, our method identifies words that demonstrate semantic and syntactic variation as well. We extend recently developed techniques for neural language models to learn word representations which capture differing semantics across geographical regions. In order to quantify this variation and ensure robust detection of true regional differences, we formulate a null model to determine whether observed changes are statistically significant. Our method is the first such approach to explicitly account for random variation due to chance while detecting regional variation in word meaning. To validate our model, we study and analyze two different massive online data sets: millions of tweets from Twitter as well as millions of phrases contained in the Google Book Ngrams. Our analysis reveals interesting facets of language change across countries.
机译:在本文中,我们提出了一种新的计算技术来检测和分析语言的统计显着的地理变化。虽然以前的方法主要集中在地区之间的词汇变化上,但我们的方法识别出展示语义和句法变异的词语。我们最近扩展了神经语言模型的技术,以学习跨地理区域捕获不同语义的词语表示。为了量化这种变化并确保真正区域差异的鲁棒检测,我们制定了一个空模型,以确定观察到的变化是否有统计学意义。我们的方法是第一种在检测词含义中的区域变化的情况下显式解释随机变化的这种方法。为了验证我们的模型,我们研究并分析了两组不同的大规模在线数据集:来自Twitter的数百万推文以及Google Book Ngrams中包含的数百万个短语。我们的分析揭示了各国语言变革的有趣方面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号