【24h】

Using Google Books Ngram in Detecting Linguistic Shifts over Time

机译:使用Google Books Ngram检测语言班次随着时间的推移

获取原文

摘要

The availability of large historical corpora, such as Google Books Ngram, makes it possible to extract various meta information about the evolution of human languages. Together with advances in machine learning techniques, researchers recently use the huge corpora to track cultural and linguistic shifts in words and terms over time. In this paper, we develop a new approach to quantitatively recognize semantic changes of words during the period between 1800 and 1990. We use the state-of-the-art FastText approach to construct word embedding for Google Books Ngram corpus for the decades within the time period 1800-1990. We use a time series analysis to identify words that have a statistically significant change in the period between 1900 and 1990. We conduct a performance evaluation study to compare our approach against related work, we show that our system is more robust against morphological language variations.
机译:大型历史小组的可用性,例如Google Books Ngram,可以提取有关人类演变的各种元信息。研究人员最近与机器学习技术的进步一起使用巨大的Corpora跟踪文字和语言随着时间的推移。在本文中,我们开发了一种新的方法来定量识别在1800和1990年至1990年期间的单词的语义变化。我们使用最先进的FastText方法来构建Google书籍Ngram Corpus的嵌入式内容时间段1800-1990。我们使用时间序列分析来识别在1900和1990年之间的期间具有统计上显着变化的词语。我们开展绩效评估研究,以比较我们对相关工作的方法,我们表明我们的系统对形态语言变化更加强大。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号