首页> 外文会议>Joint SIGHUM workshop on computational linguistics for cultural heritage, social sciences, humanities and literature 2017 >Modeling intra-textual variation with entropy and surprisal: topical vs. stylistic patterns
【24h】

Modeling intra-textual variation with entropy and surprisal: topical vs. stylistic patterns

机译:用熵和惊奇来建模文本内变化:主题模式与风格模式

获取原文
获取原文并翻译 | 示例

摘要

We present a data-driven approach to investigate intra-textual variation by combining entropy and surprisal. With this approach we detect linguistic variation based on phrasal lexico-grammatical patterns across sections of research articles. Entropy is used to detect patterns typical of specific sections. Surprisal is used to differentiate between more and less informationally-loaded patterns as well as types of information (topical vs. stylistic). While we here focus on research articles in biology/genetics, the methodology is especially interesting for digital humanities scholars, as it can be applied to any text type or domain and combined with additional variables (e.g. time, author or social group) to obtain insights on intra-textual variation.
机译:我们提出了一种数据驱动的方法,通过结合熵和惊奇来研究文本内变异。通过这种方法,我们可以跨研究文章的各个部分,根据短语词典语法模式检测语言变异。熵用于检测特定部分的典型模式。 Surprusal用于区分越来越多的信息加载模式以及信息类型(主题与风格)。虽然我们在这里专注于生物学/遗传学方面的研究文章,但是该方法对于数字人文学科的学者特别感兴趣,因为它可以应用于任何文本类型或领域,并可以与其他变量(例如时间,作者或社会团体)结合使用以获得见解文字内变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号