首页> 外文会议>World multi-conference on systemics, cybernetics and informatics >Using Statistical Properties for Author Identification
【24h】

Using Statistical Properties for Author Identification

机译:使用作者识别的统计属性

获取原文

摘要

Languages in general are highly redundant which makes text highly compressible. In this paper, English language redundancy is exploited to predict the author of English text. The method developed starts by training the system using texts with known authors. Distinct blocks for texts written by each author are determined. Those blocks are then filtered to produce, for each author, a set of unique blocks that occur in his/her writings but not in other authors' texts. In the normal operation mode, text to be categorized is processed to determine the distinct blocks in that text. A comparison between this set of distinct blocks and the unique set of distinct blocks for each author results in correct author categorization. The method described in this paper was proven to work successfully in text classification and author categorization and has the potential to be a universal method since it was tested on English and Arabic texts.
机译:语言通常是高度冗余,这使文本非常可压缩。在本文中,利用英语语言冗余来预测英语文本的作者。该方法通过使用具有已知作者的文本训练系统开发的方法。确定每个作者编写的文本的独特块。然后将这些块筛选为每个作者生成一组在他/她的着作中发生但不在其他作者的文本中发生的一组唯一块。在正常操作模式中,处理要分类的文本以确定该文本中的不同块。这组不同的块与每个作者的独特不同块集之间的比较导致正确的作者分类。本文描述的方法被证明在文本分类和作者分类中成功工作,并且有可能成为一个普遍的方法,因为它在英语和阿拉伯文文本上进行了测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号