首页> 美国卫生研究院文献>other >Rank Diversity of Languages: Generic Behavior in Computational Linguistics
【2h】

Rank Diversity of Languages: Generic Behavior in Computational Linguistics

机译:语言等级的多样性:计算语言学中的一般行为

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: “heads” consist of words which almost do not change their rank in time, “bodies” are words of general use, while “tails” are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied.
机译:语言的统计研究集中在单词的等级频率分布上。取而代之的是,我们在这里介绍一种衡量单词等级如何随时间变化的方法,并将其称为分布等级多样性。我们计算了自1800年以来以六种欧洲语言出版的书籍的多样性,发现它遵循普遍的对数正态分布。基于与对数正态分布相关的均值和标准差,我们定义了三种不同的语言单词体系:“头”由几乎不改变其时间等级的单词组成,“实体”是通用单词,而“尾” ”由上下文相关的单词组成,并且其排名随时间变化很大。头部和身体反映了语言学家确定用于基本交流的语言核心的大小。我们提出了一个高斯随机游走模型,该模型可以再现单词的时间顺序变化以及多样性。单词的等级多样性可以理解为等级随机变化的结果,其中变化的大小取决于等级本身。我们发现,所研究的所有语言的核心大小都相似。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号