首页> 外文期刊>Neural Computing & Applications >A comparative study on authorship attribution classification tasks using both neural network and statistical methods
【24h】

A comparative study on authorship attribution classification tasks using both neural network and statistical methods

机译:基于神经网络和统计方法的作者归因分类任务比较研究

获取原文
获取原文并翻译 | 示例

摘要

The present paper investigates the application of the multi-layer perceptron (MLP) to the task of categorizing texts based on their authors’ style. This task is of particular importance for information retrieval applications involving very large document databases. The emphasis of this article is to determine the extent to which the MLP model can be fine-tuned to successfully analyse such data, uncovering the stylistic differences among authors. The MLP-based method is compared and contrasted to statistical techniques, such as discriminant analysis, that are widely used in stylistic studies. The comparison of the methods is based on their classification performance, to provide an objective evaluation of the advantages of each method. A second aim of the study presented here is to compare the effectiveness of distinct features in the task of uncovering the author identity for each method. To evaluate to a greater depth the effectiveness of the entire approach, the results of the proposed MLP-based method are compared to those of established approaches, such as the support vector machines (SVM), using both the original parameters employed by the MLP as well as term frequency–inverse document frequency (TF–IDF) parameters, and the cascade correlation approach. It is found that the proposed MLP-based approach possesses a number of advantages, such as high classification accuracy, broadly comparable to that of the SVM, coupled with the ability to algorithmically reduce the set of parameters used without adversely affecting the classification accuracy.
机译:本文研究了多层感知器(MLP)在基于作者风格对文本进行分类的任务中的应用。对于涉及非常大的文档数据库的信息检索应用程序,此任务特别重要。本文的重点是确定可以微调MLP模型以成功分析此类数据的程度,从而揭示作者之间的风格差异。对基于MLP的方法进行了比较,并将其与广泛用于文体研究中的统计技术(例如判别分析)进行对比。这些方法的比较基于它们的分类性能,以客观评估每种方法的优点。本文提出的研究的第二个目的是比较各种功能在发现每种方法的作者身份这一任务中的有效性。为了更深入地评估整个方法的有效性,将建议的基于MLP的方法的结果与已建立的方法(如支持向量机(SVM))的结果进行比较,并使用MLP所使用的两个原始参数作为参数。以及术语频率-逆文档频率(TF-IDF)参数,以及级联相关方法。发现所提出的基于MLP的方法具有许多优点,例如,高分类精度,与SVM具有广泛的可比性,并且具有在算法上减少使用的参数集而不会对分类精度产生不利影响的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号