Long-range correlations and burstiness in written texts: Universal and language-specific aspects

Constantoudis Vassilios; Kalimeri Maria; Diakonos Fotis; Karamanos Konstantinos; Papadimitriou Constantinos; Chatzigeorgiou Manolis; Papageorgiou Harris

首页> 外文期刊>International Journal of Modern Physics, B. Condensed Matter Physics, Statistical Physics, Applied Physics >Long-range correlations and burstiness in written texts: Universal and language-specific aspects

【24h】

Long-range correlations and burstiness in written texts: Universal and language-specific aspects

机译：书面文本中的远程相关性和突发性：通用和特定于语言的方面

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, methods from the statistical physics of complex systems have been applied successfully to identify universal features in the long-range correlations (LRCs) of written texts. However, in real texts, these universal features are being intermingled with language-specific influences. This paper aims at the characterization and further understanding of the interplay between universal and language-specific effects on the LRCs in texts. To this end, we apply the language-sensitive mapping of written texts to word-length series (wls) and analyse large parallel (of same content) corpora from 10 languages classified to four families (Romanic, Germanic, Greek and Uralic). The autocorrelation functions of the wls reveal tiny but persistent LRCs decaying at large scales following a power-law with a language-independent exponent similar to 0.60-0.65. The impact of language is displayed in the amplitude of correlations where a relative standard deviation > 40% among the analyzed languages is observed. The classification to language families seems to play a significant role since, the Finnish and Germanic languages exhibit more correlations than the Greek and Roman families. To reveal the origins of the LRCs, we focus on the long words and perform burst and correlation analysis in their positions along the corpora. We find that the universal features are linked more to the correlations of the inter-long word distances while the language-specific aspects are related more to their distributions.

机译：最近，已经成功地应用了来自复杂系统的统计物理学的方法来识别书面文本的远程关联（LRC）中的通用特征。但是，在实际文本中，这些通用功能正与特定语言的影响混合在一起。本文旨在表征和进一步理解文本中LRC的通用和特定于语言的影响之间的相互作用。为此，我们将书面文本的语言敏感映射应用于字长序列（wls），并分析来自10种语言的大型并行（相同内容）语料库，这些语料被分为四个家族（浪漫，日耳曼，希腊和乌拉尔语）。 wls的自相关函数揭示了幂函数的微小但持久的LRC在幂律下具有与语言无关的指数类似于0.60-0.65的大规模衰减。语言的影响以相关的幅度显示，在分析的语言中观察到相对标准偏差> 40％。语言族的分类似乎起着重要作用，因为芬兰和日耳曼语比希腊和罗马族显示出更多的相关性。为了揭示LRC的起源，我们关注长字并对其在语料库中的位置进行猝发和相关分析。我们发现，通用特征更多地与长字间距离的相关性相关，而特定于语言的方面则与它们的分布更相关。

著录项

来源
《International Journal of Modern Physics, B. Condensed Matter Physics, Statistical Physics, Applied Physics》 |2016年第15期|共13页
作者
Constantoudis Vassilios; Kalimeri Maria; Diakonos Fotis; Karamanos Konstantinos; Papadimitriou Constantinos; Chatzigeorgiou Manolis; Papageorgiou Harris;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类统计物理学;固体物理学;应用物理学;
关键词
Long-range correlations; burstiness; language; language families; universality; power-law;

机译：远程关联;突发性;语言;语言族;大学;幂律;

相似文献

外文文献
中文文献
专利

1. Long-range correlations and burstiness in written texts: Universal and language-specific aspects [J] . Constantoudis Vassilios, Kalimeri Maria, Diakonos Fotis, International Journal of Modern Physics, B. Condensed Matter Physics, Statistical Physics, Applied Physics . 2016,第15期

机译：书面文本中的远程相关性和突发性：通用和特定于语言的方面
2. Measuring Dynamic Correlations of Words in Written Texts with an Autocorrelation Function [J] . Hiroshi Ogura, Hiromi Amano, Masato Kondo Journal of Data Analysis and Information Processing . 2019,第2期

机译：用自相关函数测量文字中单词的动态相关性
3. Origin of Dynamic Correlations of Words in Written Texts [J] . Hiroshi Ogura, Hiromi Amano, Masato Kondo Journal of Data Analysis and Information Processing . 2019,第4期

机译：书面文字中词语动态相关的起源
4. LONG-RANGE CORRELATIONS AND UNIVERSALITY IN PLASMA EDGE TURBULENCE [C] . B. Ph. van Milligen, B.A. Carreras, M.A. Pedrosa, Fusion energy 1998 . 1998

机译：等离子体边缘湍流的长距离相关性和普遍性
5. Children's use of universal and language-specific cues in verb learning. [D] . Maguire, Mandy J. 2004

机译：儿童在动词学习中使用通用和特定于语言的提示。
6. Hierarchical structures induce long-range dynamical correlations in written texts [O] . E. Alvarez-Lacalle, B. Dorow, J.-P. Eckmann, 2006

机译：层次结构在书面文本中引起远程动态关联
7. Hierarchical structures induce long-range dynamical correlations in written texts [O] . Alvarez-Lacalle, E., Dorow, B., Eckmann, J.-P., 2006

机译：层次结构在书面文本中引起远程动态关联

Long-range correlations and burstiness in written texts: Universal and language-specific aspects

摘要

著录项

相似文献

相关主题

期刊订阅