Authorship Attribution and Verification with Many Authors and Limited Data

机译：作者归属与许多作者和有限的数据验证

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Most studies in statistical or machine learning based authorship attribution focus on two or a few authors. This leads to an overestimation of the importance of the features extracted from the training data and found to be discriminating for these small sets of authors. Most studies also use sizes of training data that are unrealistic for situations in which stylometry is applied (e.g., forensics), and thereby overestimate the accuracy of their approach in these situations. A more realistic interpretation of the task is as an authorship verification problem that we approximate by pooling data from many different authors as negative examples. In this paper, we show, on the basis of a new corpus with 145 authors, what the effect is of many authors on feature selection and learning, and show robustness of a memory-based learning approach in doing authorship attribution and verification with many authors and limited training data when compared to eager learning methods such as SVMs and maximum entropy learning.

机译：大多数统计或机器学习的作者归属于两个或几个作者侧重于此。这导致高估了从训练数据中提取的功能的重要性，并发现要为这些小组作者区分。大多数研究还使用培训数据的尺寸，这些训练数据对于应用了练习术的情况（例如，取证），从而高估在这些情况下其方法的准确性。对任务的更现实的解释是作为一个由来自许多不同作者的数据作为否定例子来汇集数据的作者验证问题。在本文中，我们以具有145名作者的新语料库显示，其中许多作者在特征选择和学习中的效果是什么，并展示了基于内存的学习方法的鲁棒性，并与许多作者验证与渴望学习方法（如SVM和最大熵学习）相比，培训数据有限。

著录项

来源
《International Conference on Computational Linguistics》|2008年||共8页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data [J] . Fatma Howedi, Masnizah Mohd, Zahra Aborawi Aborawi, Journal of computer sciences . 2020,第10期

机译：短期阿拉伯语文本的作者归属使用仪表特征和具有有限培训数据的KNN分类器
2. Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data [J] . Fatma Howedi, Masnizah Mohd, Zahra Aborawi Aborawi, Journal of computer sciences . 2020,第10期

机译：短期阿拉伯语文本的作者归属使用仪表特征和KNN分类器，具有有限的培训数据
3. Tri-Training for authorship attribution with limited training data: a comprehensive study [J] . Qian Tieyun, Liu Bing, Chen Li, Neurocomputing . 2016,第JANa1期

机译：培训数据有限的作者身份归因三轮培训：全面的研究
4. Authorship Attribution and Verification with Many Authors and Limited Data [C] . Kim Luyckx, Walter Daelemans 22nd International Conference on Computational Linguistics . 2008

机译：作者众多且资料有限的作者归属和验证
5. Network Data Analysis of Word Graphs with Applications to Authorship Attribution [D] . Leonard, Timothy. 2018

机译：词图的网络数据分析及其在作者归属中的应用
6. Authorship. Guidelines exist on ownership of data and authorship in multicentre collaborations. [O] . A. Barker, R. A. Powell 1997

机译：著作权。在多中心协作中存在有关数据所有权和作者身份的准则。
7. 2008) Authorship attribution and verification with many authors and limited data [O] . Kim Luyckx, Walter Daelemans 2013

机译：2008）作者身份归属和验证与许多作者和有限的数据

Authorship Attribution and Verification with Many Authors and Limited Data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅