首页> 外文期刊>Proceedings of the National Academy of Sciences of the United States of America >Hierarchical structures induce long-range dynamical correlations in written texts
【24h】

Hierarchical structures induce long-range dynamical correlations in written texts

机译:层次结构在书面文本中引起远程动态关联

获取原文
获取原文并翻译 | 示例
           

摘要

Thoughts and ideas are multidimensional and often concurrent, yet they can be expressed surprisingly well sequentially by the translation into language. This reduction of dimensions occurs naturally but requires memory and necessitates the existence of correlations, e.g., in written text. However, correlations in word appearance decay quickly, while previous observations of long-range correlations using random walk approaches yield little insight on memory or on semantic context. Instead, we study combinations of words that a reader is exposed to within a "window of attention" spanning about 100 words. We define a vector space of such word combinations by looking at words that co-occur within the window of attention, and analyze its structure. Singular value decomposition of the co-occurrence matrix identifies a basis whose vectors correspond to specific topics, or "concepts" that are relevant to the text. As the reader follows a text, the "vector of attention" traces out a trajectory of directions in this "concept space." We find that memory of the direction is retained over long times, forming power-law correlations. The appearance of power laws hints at the existence of an underlying hierarchical network. Indeed, imposing a hierarchy similar to that defined by volumes, chapters, paragraphs, etc. succeeds in creating correlations in a surrogate random text that are identical to those of the original text. We conclude that hierarchical structures in text serve to create long-range correlations, and use the reader's memory in reenacting some of the multidimensionality of the thoughts being expressed.
机译:思想和观念是多维的,通常是并发的,但是通过翻译成语言,它们可以令人惊奇地很好地依次表达。尺寸的减小是自然发生的,但是需要记忆,并且必须存在相关性,例如在书面文本中。但是,单词外观中的相关性会迅速衰减,而以前使用随机游走方法进行的远程相关性的观察结果对内存或语义上下文的了解很少。取而代之的是,我们研究读者在“关注窗口”中所接触的单词组合,这些单词组合涵盖大约100个单词。我们通过查看在关注窗口内同时出现的单词来定义此类单词组合的向量空间,并分析其结构。同时出现矩阵的奇异值分解确定了一个基础,该基础的向量对应于与文本相关的特定主题或“概念”。当读者阅读文本时,“注意向量”会在此“概念空间”中描绘出方向轨迹。我们发现方向的记忆会长时间保留,从而形成幂律相关性。幂律的出现暗示着底层的分层网络的存在。实际上,强加类似于卷,章节,段落等定义的层次结构,可以成功在替代随机文本中创建与原始文本相同的关联。我们得出结论,文本中的层次结构有助于建立长期的关联性,并利用读者的记忆来重新体现所表达思想的某些多维性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号