...
首页> 外文期刊>Journal of Physics Communications >Taylor’s law for linguistic sequences and random walk models
【24h】

Taylor’s law for linguistic sequences and random walk models

机译:语言序列和随机游走模型的泰勒定律

获取原文
           

摘要

Taylor’s law describes the fluctuation characteristics underlying a complex system in which the variance of an event within a time span grows by a power law with respect to the mean. Although Taylor’s law has been applied in many natural and social systems, its application for language has been scarce. This article describes a new, natural way to apply Taylor analysis to texts. The method was applied to over 1100 texts across 14 languages and showed how the Taylor exponents of natural language written texts were consistently around 0.58, thus being universal. The exponents were also evaluated for other language related data, such as speech corpora (0.63 for adult speech, 0.68 for child-directed speech), programming language sources (0.79), and music (0.79).The results show how the Taylor exponent serves to quantify the fundamental structural complexity underlying linguistic time series. To explain the nature of natural language sequences possessing such different degrees of fluctuation, we investigated various mathematical models that could produce a Taylor exponent similar to that of real data. While the majority of previous probabilistic sequential models could not produce a Taylor exponent larger than 0.50, the same as in the independent and identically distributed (i.i.d.) case, random walk sequences on complex networks could produce fluctuation. We show that among various possibilities, random walks on a Barabasi-Albert (BA) graph with small mean degree could fulfill the scaling properties of Zipf’s law and the long-range correlation, in addition to having a Taylor’s law exponent larger than 0.5, thus giving a new perspective to reconsider the nature of language.
机译:泰勒定律描述了一个复杂系统的波动特征,其中一个事件在一定时间范围内的方差相对于均值通过幂定律增长。尽管泰勒定律已在许多自然和社会系统中得到应用,但对语言的应用却很少。本文介绍了一种新的自然方法,可将泰勒分析应用于文本。该方法已应用于14种语言的1100多种文本,显示了自然语言书面文本的泰勒指数如何始终保持在0.58附近,因此具有普遍性。还对指数的其他语言相关数据进行了评估,例如语音语料库(成人语音为0.63,儿童语音为0.68),编程语言源(0.79)和音乐(0.79)。结果显示了泰勒指数如何发挥作用量化语言时间序列背后的基本结构复杂性。为了解释具有这种不同程度波动的自然语言序列的性质,我们研究了各种数学模型,这些模型可能产生类似于真实数据的泰勒指数。尽管大多数以前的概率顺序模型无法产生大于0.50的泰勒指数,但与独立且均等分布(i.d.)的情况相同,复杂网络上的随机游动序列可能会产生波动。我们表明,在各种可能性中,除平均泰勒定律指数大于0.5外,在平均程度较小的Barabasi-Albert(BA)图上随机游走还可以满足Zipf定律的定标性质和远距离相关性,因此为重新考虑语言的本质提供了新的视角。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号