Authorship Attribution in Russian with New High-Performing and Fully Interpretable Morpho-Syntactic Features

机译：俄罗斯的作者归属，具有新的高性能和完全解释的杂语特征

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work tackles the problem of modeling author style in Russian. In particular, we solve the task of authorship attribution using the collected dataset of 30 authors, 1506 texts written in the period of 18th-21st century. We apply various approaches to solving the attribution problem: Random Forest, Logistic Regression, SVM Classifier. In terms of text representation, we use seven models in three language levels: lexis, morphology, and syntax. Most importantly, we propose our own set of morpho-syntactic features that perform on about the same level as doc2vec, but are fully interpretable. The conducted experiments show the effectiveness of their standalone use, as well as the increase in the quality of classification when using these attributes along with the classic doc2vec-based approach. All code, including feature extraction, is made freely available. Additionally, we analyze the performance of individual features as style markers. Finally, we study classification errors in order to identify the patterns in the misattribution of specific authors.

机译：这项工作解决了俄语建模作者风格的问题。特别是，我们使用30名作者的收集数据集来解决Autheration atture的任务，在18世纪的第18世纪撰写的1506个文本。我们应用各种方法来解决归因问题：随机林，逻辑回归，SVM分类器。在文本表示方面，我们使用三种语言级别的七种模型：Lexis，形态和语法。最重要的是，我们提出了我们自己的一组Morpho语法功能，以与DOC2VEC相同的级别执行，但是完全解释的。所进行的实验表明，当使用这些属性以及基于经典的DOC2VEC的方法时，他们独立使用的有效性，以及分类质量的增加。所有代码，包括特征提取，都是免费提供的。此外，我们分析了单个功能的性能作为样式标记。最后，我们研究分类错误，以识别特定作者误解的模式。

著录项

来源
《International Conference on Analysis of Images, Social Networks, and Texts》|2019年|426p|共12页
会议地点
作者
Elena Pimonova; Oleg Durandin; Alexey Malafeev;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词
Authorship attribution; Author style; Text classification; Text representation; Morpho-syntactic features; Language feature engineering; Machine learning; Natural language processing;

机译：作者归属;作者风格;文本分类;文字表示;句法特征;语言特色工程;机器学习;自然语言处理;

相似文献

外文文献
中文文献
专利

1. Machine Learning and Feature Selection for Authorship Attribution: The Case of Mill, Taylor Mill and Taylor, in the Nineteenth Century [J] . Andreas Neocleous, Antis Loizides Quality Control, Transactions . 2021,第1期

机译：Autheration Attuction的机器学习和专题选择：Mill，Taylor Mill和Taylor的情况，在十九世纪
2. Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data [J] . Fatma Howedi, Masnizah Mohd, Zahra Aborawi Aborawi, Journal of computer sciences . 2020,第10期

机译：短期阿拉伯语文本的作者归属使用仪表特征和具有有限培训数据的KNN分类器
3. Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data [J] . Fatma Howedi, Masnizah Mohd, Zahra Aborawi Aborawi, Journal of computer sciences . 2020,第10期

机译：短期阿拉伯语文本的作者归属使用仪表特征和KNN分类器，具有有限的培训数据
4. Authorship Attribution in Russian with New High-Performing and Fully Interpretable Morpho-Syntactic Features [C] . Elena Pimonova, Oleg Durandin, Alexey Malafeev International conference on analysis of Images, social networks and texts . 2019

机译：具有新的高性能和完全可解释的词法句法特征的俄语作者身份归属
5. Impact of co-authorship strategies on research productivity: A social-network analysis of publications in Russian cardiology. [D] . Kuzhabekova, Aliya. 2011

机译：共同作者策略对研究生产率的影响：俄罗斯心脏病学出版物的社交网络分析。
6. What attributions do Australian high-performing general practices make for their success? Applying the clinical microsystems framework: a qualitative study [O] . Annette H Dunham, James A Dunbar, Julie K Johnson, 2018

机译：澳大利亚的高绩效常规做法对成功有何影响？应用临床微系统框架：定性研究
7. Incorporating Topic Information in a Global Feature Selection Schema for Authorship Attribution [O] . Hayri Volkan Agun, Ozgur Yilmazel 2019

机译：在全局特征选择模式中结合主题信息，以获得Authorive归属

Authorship Attribution in Russian with New High-Performing and Fully Interpretable Morpho-Syntactic Features

摘要

著录项

相似文献

相关主题

期刊订阅