...
首页> 外文期刊>Literary & linguistic computing >Mind your corpus: systematic errors in authorship attribution
【24h】

Mind your corpus: systematic errors in authorship attribution

机译:注意语料库:作者归属的系统性错误

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In computational stylistics, any influence of unwanted noise-e.g. caused by an untidily prepared corpus-might lead to biased or false results. Relying on contaminated data is similar to using dirty test tubes in a laboratory: it inescapably means falling into systematic error. An important question is what degree of nonchalance is acceptable to obtain sufficiently reliable results. The present study attempts to verify the impact of unwanted noise in a series of experiments conducted on several corpora of English, German, Polish, Ancient Greek, and Latin prose texts. In 100 iterations, a given corpus was gradually damaged, and controlled tests for authorship were applied. The first experiment was designed to show the correlation between a dirty corpus and attribution accuracy. The second was aimed to test how disorder in word frequencies-produced by scribal and/or editorial modifications-affects the attribution abilities of particular corpora. The goal of the third experiment was to test how much 'authorial' data a given text needs to have to trace authorial fingerprint through a mass of external quotations.
机译:在计算文体上,不需要的噪声的任何影响,例如因准备不正确的语料库而导致的结果有偏差或错误。依赖受污染的数据类似于在实验室中使用肮脏的试管:这不可避免地意味着陷入系统错误。一个重要的问题是获得足够可靠结果的不平衡程度是可接受的。本研究试图通过对英语,德语,波兰语,古希腊语和拉丁语散文文本的几种语料库进行的一系列实验来验证有害噪声的影响。在100次迭代中,给定语料库逐渐受损,并应用了作者资格的受控测试。设计第一个实验以显示脏语料库与归因准确性之间的相关性。第二个目的是测试抄写和/或编辑修改引起的词频混乱如何影响特定语料库的归因能力。第三个实验的目的是测试给定文本通过大量外部引用来跟踪作者指纹时需要多少“作者”数据。

著录项

  • 来源
    《Literary & linguistic computing》 |2013年第4期|603-614|共12页
  • 作者

    Maciej Eder;

  • 作者单位

    Pedagogical University, Krakow, Poland Polish Academy of Sciences, Institute of Polish Language, Krakow, Poland,Institute of Polish Studies, Pedagogical University of Krakow, ul. Podchorqzych 2, 30-084 Krakow, Poland;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号