首页> 外文期刊>Expert Systems with Application >Literary writing style recognition via a minimal spanning tree-based approach
【24h】

Literary writing style recognition via a minimal spanning tree-based approach

机译:通过基于最小生成树的方法识别文学写作风格

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, we address the problem of literary writing style determination using a comparison of the randomness of two given texts. We attempt to comprehend if these texts are generated from distinct probability sources that can reveal a difference between the literary writing styles of the corresponding authors. We propose a new approach based on the incorporation of the known Friedman-Rafsky two-sample test into a multistage procedure with the aim of stabilizing the process. A sampling pro cedure constructed by applying the N-grams methodology is applied to simulate samples drawn from the pooled text with the aim of evaluating the null hypothesis distribution that appears after the writing styles coincide. Next, samples from different files are selected, and the p-values of the test statistics are calculated. An empirical distribution of these values is compared numerous times with the uniform one on the interval [0, 1], and the writing styles are recognized as different if the rejection fraction in this comparison's sequence is significantly greater than 0.5. The offered approach is language independent in the community of alphabetic languages and does not involve the use of linguistics. In comparison with most existing methods our approach does not deal with any authorship attribute determination. A text itself, more precisely speaking, the distribution of sequential text templates and their mutual occurrences essentially identifies the style. Experiments demonstrate the strong capability of the proposed method. (C) 2016 Elsevier Ltd. All rights reserved.
机译:在本文中,我们通过比较两个给定文本的随机性来解决文学写作风格确定的问题。我们试图理解这些文本是否来自不同的概率来源,这些来源可以揭示相应作者的文学写作风格之间的差异。我们提出了一种新方法,该方法基于将已知的Friedman-Rafsky两样品检验纳入多阶段程序的目的,目的是稳定过程。通过应用N-grams方法构建的抽样程序被用来模拟从合并的文本中抽取的样本,目的是评估在写作风格重合后出现的零假设分布。接下来,从不同文件中选择样本,并计算出检验统计量的p值。将这些值的经验分布与间隔[0,1]上的均匀值进行多次比较,并且如果此比较序列中的拒绝率显着大于0.5,则书写样式将被识别为不同。所提供的方法在字母语言社区中是独立于语言的,并且不涉及语言学的使用。与大多数现有方法相比,我们的方法不处理任何作者身份属性确定。更准确地说,文本本身就是顺序文本模板的分布及其相互出现的方式,从本质上确定了样式。实验证明了该方法的强大能力。 (C)2016 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Expert Systems with Application》 |2016年第11期|145-153|共9页
  • 作者单位

    St Petersburg State Univ, Fac Math & Mech, Univ Sky Prospekt 28, St Petersburg 198504, Russia;

    St Petersburg State Univ, Fac Math & Mech, Univ Sky Prospekt 28, St Petersburg 198504, Russia|St Petersburg State Univ, Res Lab Anal & Modeling Social Proc, Univ Sky Prospekt 28, St Petersburg 198504, Russia;

    Charles Univ Prague, Dept Probabil & Stat, Prague, Czech Republic;

    ORT Braude Coll Engn, Software Engn Dept, IL-21982 Karmiel, Israel;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Writing style determination; Two-sample spanning Tree-based test;

    机译:写作风格确定;基于两样本生成树的测试;
  • 入库时间 2022-08-17 13:29:42

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号