Authors' Writing Styles Based Authorship Identification System Using the Text Representation Vector

机译：作者使用文本表示向量的基于作者的作者身份识别系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text mining is one of the main and typical tasks of machine learning (ML). Authorship identification (AI) is a standard research subject in text mining and natural language processing (NLP) that has undergone a remarkable evolution these last years. We need to identify/determine the actual author of anonymous texts given on the basis of a set of writing samples. Standard text classification often focuses on many handcrafted features such as dictionaries, knowledge bases, and different stylometric characteristics, which often leads to remarkable dimensionality. Unlike traditional approaches, this paper suggests an authorship identification approach based on automatic feature engineering using word2vec word embeddings, taking into account each author's writing style. This system includes two learning phases, the first stage aims to generate the semantic representation of each author by using word2vec to learn and extract the most relevant characteristics of the raw document. The second stage is to apply the multilayer perceptron (MLP) classifier to fix the classification rules using the backpropagation learning algorithm. Experiments show that MLP classifier with word2vec model earns an accuracy of 95.83% for an English corpus, suggesting that the word2vec word embedding model can evidently enhance the identification accuracy compared to other classical models such as n-gram frequencies and bag of words.

机译：文本挖掘是机器学习（ML）的主要和典型任务之一。作者身份证明（AI）是文本挖掘和自然语言处理（NLP）中的标准研究主题，过去几年已经经历了显着的演变。我们需要识别/确定基于一组写作样本给出的匿名文本的实际作者。标准文本分类通常侧重于许多手工制作的功能，例如词典，知识库和不同款式特征，这通常会导致显着的维度。与传统方法不同，本文介绍了使用Word2Vec Word Embeddings的自动特征工程的作者识别方法，考虑到每个作者的写作风格。该系统包括两个学习阶段，第一阶段旨在通过使用Word2VEC来生成每个作者的语义表示来学习和提取原始文档的最相关的特征。第二阶段是应用MultiDayer Perceptron（MLP）分类器来使用BackProjagation学习算法来修复分类规则。实验表明，具有Word2VEC模型的MLP分类器为英语语料库获得了95.83％的准确性，表明与其他经典模型和单词袋等其他经典模型相比，Word2Vec字嵌入模型可以显然提高识别准确性。

著录项

来源
《International Multi-Conference on Systems, Signals amp;amp;amp;amp;amp;amp; Devices》|2019年|775 p. :|共6页
会议地点
作者
Nacer Eddine Benzebouchi; Nabiha Azizi; Nacer Eddine Hammami; Didier Schwab; Mohammed Chiheb Eddine Khelaifia; Monther Aldwairi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TM-532;
关键词
Authorship Identification; Text Mining; Natural Language Processing; Word2Vec; MLP classifier;

机译：作者身份证明;文本挖掘;自然语言处理;Word2Vec;MLP分类器;
入库时间 2022-08-21 10:44:49

相似文献

外文文献
中文文献
专利

1. A framework for authorship identification of Online messages: Writing-style features and classification techniques [J] . Zheng R, Li JX, Chen HC, Journal of the American Society for Information Science and Technology . 2006,第3期

机译：在线消息作者身份识别的框架：写作风格的功能和分类技术
2. Geometric Identification and Control of Nonlinear Dynamic Systems Based on Floating Basis Vector Representation [J] . Jozsef K. Tar, Imre J. Rudas, Miklos Ronto Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2006,第4期

机译：基于浮动基向量表示的非线性动力系统的几何辨识与控制
3. The Email Author Identification System Based on Support Vector Machine (SVM) and Analytic Hierarchy Process (AHP) [J] . Qinghe Zheng, Xinyu Tian, Mingqiang Yang, IAENG Internaitonal journal of computer science . 2019,第2PTa141a263期

机译：基于支持向量机和层次分析法的电子邮件作者识别系统
4. Authors' Writing Styles Based Authorship Identification System Using the Text Representation Vector [C] . Nacer Eddine Benzebouchi, Nabiha Azizi, Nacer Eddine Hammami, International Multi-Conference on Systems, Signals amp;amp;amp;amp;amp;amp; Devices . 2019

机译：作者使用文本表示向量的基于作者的作者身份识别系统
5. Authorship's Wake: Writing After the Death of the Author [D] . Sayers, Philip Christopher Gore 2018

机译：作者唤醒：作者死后写作
6. Authorship. Changing authorship system might be counterproductive. [O] . T. Scott 1997

机译：著作权。更改作者制度可能会适得其反。
7. Authorship Arabic Text Detection According to Style of Writing by Using (SABA) Method [O] . Tareef Kamil Mustafa, Ammar Adil Abdul Razzaq, Ehsan Ali Al-Zubaidi 2017

机译：通过使用（SABA）方法根据写作风格的作者阿拉伯文文本检测

Authors' Writing Styles Based Authorship Identification System Using the Text Representation Vector

摘要

著录项

相似文献

相关主题

期刊订阅