首页> 外文会议>International Conference on Analysis of Images, Social Networks, and Texts >Authorship Attribution in Russian with New High-Performing and Fully Interpretable Morpho-Syntactic Features
【24h】

Authorship Attribution in Russian with New High-Performing and Fully Interpretable Morpho-Syntactic Features

机译:俄罗斯的作者归属,具有新的高性能和完全解释的杂语特征

获取原文

摘要

This work tackles the problem of modeling author style in Russian. In particular, we solve the task of authorship attribution using the collected dataset of 30 authors, 1506 texts written in the period of 18th-21st century. We apply various approaches to solving the attribution problem: Random Forest, Logistic Regression, SVM Classifier. In terms of text representation, we use seven models in three language levels: lexis, morphology, and syntax. Most importantly, we propose our own set of morpho-syntactic features that perform on about the same level as doc2vec, but are fully interpretable. The conducted experiments show the effectiveness of their standalone use, as well as the increase in the quality of classification when using these attributes along with the classic doc2vec-based approach. All code, including feature extraction, is made freely available. Additionally, we analyze the performance of individual features as style markers. Finally, we study classification errors in order to identify the patterns in the misattribution of specific authors.
机译:这项工作解决了俄语建模作者风格的问题。特别是,我们使用30名作者的收集数据集来解决Autheration atture的任务,在18世纪的第18世纪撰写的1506个文本。我们应用各种方法来解决归因问题:随机林,逻辑回归,SVM分类器。在文本表示方面,我们使用三种语言级别的七种模型:Lexis,形态和语法。最重要的是,我们提出了我们自己的一组Morpho语法功能,以与DOC2VEC相同的级别执行,但是完全解释的。所进行的实验表明,当使用这些属性以及基于经典的DOC2VEC的方法时,他们独立使用的有效性,以及分类质量的增加。所有代码,包括特征提取,都是免费提供的。此外,我们分析了单个功能的性能作为样式标记。最后,我们研究分类错误,以识别特定作者误解的模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号