Authorship attribution based on a probabilistic topic model

Jacques Savoy

首页> 外文期刊>Information Processing & Management >Authorship attribution based on a probabilistic topic model

【24h】

Authorship attribution based on a probabilistic topic model

机译：基于概率主题模型的作者归属

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes, evaluates and compares the use of Latent Dirichlet allocation (LDA) as an approach to authorship attribution. Based on this generative probabilistic topic model, we can model each document as a mixture of topic distributions with each topic specifying a distribution over words. Based on author profiles (aggregation of all texts written by the same writer) we suggest computing the distance with a disputed text to determine its possible writer. This distance is based on the difference between the two topic distributions. To evaluate different attribution schemes, we carried out an experiment based on 5408 newspaper articles (Glasgow Herald) written by 20 distinct authors. To complement this experiment, we used 4326 articles extracted from the Italian newspaper La Stampa and written by 20 journalists. This research demonstrates that the LDA-based classification scheme tends to outperform the Delta rule, and the x~2 distance, two classical approaches in authorship attribution based on a restricted number of terms. Compared to the Kull-back-Leibler divergence, the LDA-based scheme can provide better effectiveness when considering a larger number of terms.

机译：本文描述，评估和比较了潜在的狄利克雷分配（LDA）作为作者身份归属的一种方法。基于此生成概率主题模型，我们可以将每个文档建模为主题分布的混合，每个主题指定单词的分布。根据作者简介（同一位作者撰写的所有文本的总和），我们建议计算有争议文本的距离，以确定可能的作者。该距离基于两个主题分布之间的差异。为了评估不同的归因方案，我们基于20位不同作者撰写的5408篇报纸文章（《格拉斯哥先驱报》）进行了一项实验。为了补充该实验，我们使用了来自意大利报纸La Stampa的4326篇文章，并由20名记者撰写。这项研究表明，基于LDA的分类方案倾向于优于Delta规则和x〜2距离，这是基于有限数量术语的两种经典著作权归属方法。与Kull-back-Leibler散度相比，当考虑大量项时，基于LDA的方案可以提供更好的有效性。

著录项

来源
《Information Processing & Management》 |2013年第1期|341-354|共14页
作者
Jacques Savoy;
展开▼
作者单位

Computer Science Department, University of Neuchatel, Rue Emile Argand 11, 2000 Neuchatel, Switzerland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
authorship attribution; text categorization; machine learning; lexical statistics;

机译：作者身份归属;文本分类机器学习词汇统计;
入库时间 2022-08-17 23:20:14

相似文献

外文文献
中文文献
专利

1. Authorship Attribution with Topic Models [J] . Yanir Serouss, Ingrid Zukerma, Fabian Bohner Computational linguistics . 2014,第2期

机译：主题模型的作者身份归因
2. A Topic Drift Model for authorship attribution [J] . Yang Min, Chen Xiaojun, Tu Wenting, Neurocomputing . 2018,第jana17期

机译：作者归属的主题漂移模型
3. Masking Topic-Related Information to Enhance Authorship Attribution [J] . Stamatatos Efstathios Journal of the American Society for Information Science and Technology . 2018,第3期

机译：屏蔽与主题相关的信息以增强作者的归属
4. Authorship Attribution via Evolutionary Hybridization of Sentiment Analysis, LIWC, and Topic Modeling Features [C] . Joshua Gaston, Mina Narayanan, Gerry Dozier, IEEE Symposium Series on Computational Intelligence . 2018

机译：通过情感分析，LIWC和主题建模功能的进化混合的作者身份归因
5. Probabilistic Topic Modeling and Classification Probabilistic PCA for Text Corpora. [D] . Cheng, Chi Wa. 2011

机译：文本主题的概率主题建模和分类概率PCA。
6. Cross-Domain Authorship Attribution Using Pre-trained Language Models [O] . Georgios Barlas, Efstathios Stamatatos -1

机译：使用预先训练的语言模型进行跨域作者归属
7. Authorship attribution based on a probabilistic topic model [O] . Savoy, Jacques 2013

机译：基于概率主题模型的作者归属

Authorship attribution based on a probabilistic topic model

摘要

著录项

相似文献

相关主题

期刊订阅