首页> 美国卫生研究院文献>Royal Society Open Science >Heaps’ Law and Heaps functions in tagged texts: evidences of their linguistic relevance
【2h】

Heaps’ Law and Heaps functions in tagged texts: evidences of their linguistic relevance

机译:堆文本中的堆定律和堆功能:它们在语言上的相关性的证据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We study the relationship between vocabulary size and text length in a corpus of 75 literary works in English, authored by six writers, distinguishing between the contributions of three grammatical classes (or ‘tags,’ namely, , and ), and analyse the progressive appearance of new words of each tag along each individual text. We find that, as prescribed by Heaps’ Law, vocabulary sizes and text lengths follow a well-defined power-law relation. Meanwhile, the appearance of new words in each text does not obey a power law, and is on the whole well described by the average of random shufflings of the text. Deviations from this average, however, are statistically significant and show systematic trends across the corpus. Specifically, we find that the appearance of new words along each text is predominantly retarded with respect to the average of random shufflings. Moreover, different tags add systematically distinct contributions to this tendency, with and being respectively more and less retarded than the mean trend, and following instead the overall mean. These statistical systematicities are likely to point to the existence of linguistically relevant information stored in the different variants of Heaps’ Law, a feature that is still in need of extensive assessment.
机译:我们研究了六位作者撰写的75篇英语文学作品的语料库中词汇量与文本长度之间的关系,区分了三种语法课(或“标记”,即和)的贡献,并分析了渐进式出现每个文本的每个标签的新单词的集合。我们发现,按照《堆定律》的规定,词汇量和文本长度遵循明确定义的幂律关系。同时,每个文本中出现的新单词均不服从幂律,并且总体上可以通过对文本进行随机混洗的平均值来很好地描述。但是,与该平均值的偏差在统计上是显着的,并且显示了整个语料库中的系统趋势。具体地说,我们发现,相对于随机混排的平均值,沿每个文本出现的新单词主要受到延迟。而且,不同的标签在系统上为这种趋势增加了不同的贡献,与平均趋势相比,其延迟程度分别有所降低,并且分别落后于总体平均值。这些统计系统可能表明存在与语言有关的信息,这些信息存储在堆定律的不同变体中,这一功能仍需要进行广泛评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号