Authorship Attribution and Verification with Many Authors and Limited Data

机译：作者众多且资料有限的作者归属和验证

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most studies in statistical or machine learning based authorship attribution focus on two or a few authors. This leads to an overestimation of the importance of the features extracted from the training data and found to be discriminating for these small sets of authors. Most studies also use sizes of training data that are unrealistic for situations in which stylometry is applied (e.g., forensics), and thereby overestimate the accuracy of their approach in these situations. A more realistic interpretation of the task is as an authorship verification problem that we approximate by pooling data from many different authors as negative examples. In this paper, we show, on the basis of a new corpus with 145 authors, what the effect is of many authors on feature selection and learning, and show robustness of a memory-based learning approach in doing authorship attribution and verification with many authors and limited training data when compared to eager learning methods such as SVMs and maximum entropy learning.

机译：基于统计或基于机器学习的作者身份的大多数研究都集中于两位或几位作者。这导致对从训练数据中提取的特征的重要性的高估，并且发现这些特征对这些小批作者是有区别的。大多数研究还使用了训练数据的大小，这些数据对于应用测距法的情况（例如法医）是不切实际的，从而高估了这些方法在这些情况下的准确性。对该任务的更现实的解释是作为作者身份验证问题，我们通过合并来自许多不同作者的数据作为负面示例来进行近似。在本文中，我们基于一个具有145位作者的新语料库，展示了许多作者对特征选择和学习的影响，并展示了基于记忆的学习方法在与多位作者进行作者身份归因和验证方面的鲁棒性与渴望的学习方法（例如SVM和最大熵学习）相比，培训数据有限。

著录项

来源
《22nd International Conference on Computational Linguistics》|2008年|513-520|共8页
会议地点 Manchester(GB);Manchester(GB)
作者
Kim Luyckx; Walter Daelemans;
展开▼
作者单位

CNTS Language Technology Group University of Antwerp Prinsstraat 13, 2000 Antwerp, Belgium;

CNTS Language Technology Group University of Antwerp Prinsstraat 13, 2000 Antwerp, Belgium;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data [J] . Fatma Howedi, Masnizah Mohd, Zahra Aborawi Aborawi, Journal of computer sciences . 2020,第10期

机译：短期阿拉伯语文本的作者归属使用仪表特征和具有有限培训数据的KNN分类器
2. Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data [J] . Fatma Howedi, Masnizah Mohd, Zahra Aborawi Aborawi, Journal of computer sciences . 2020,第10期

机译：短期阿拉伯语文本的作者归属使用仪表特征和KNN分类器，具有有限的培训数据
3. Tri-Training for authorship attribution with limited training data: a comprehensive study [J] . Qian Tieyun, Liu Bing, Chen Li, Neurocomputing . 2016,第JANa1期

机译：培训数据有限的作者身份归因三轮培训：全面的研究
4. Authorship Attribution and Verification with Many Authors and Limited Data [C] . International Conference on Computational Linguistics . 2008

机译：作者归属与许多作者和有限的数据验证
5. Network Data Analysis of Word Graphs with Applications to Authorship Attribution [D] . Leonard, Timothy. 2018

机译：词图的网络数据分析及其在作者归属中的应用
6. Authorship. Guidelines exist on ownership of data and authorship in multicentre collaborations. [O] . A. Barker, R. A. Powell 1997

机译：著作权。在多中心协作中存在有关数据所有权和作者身份的准则。
7. 2008) Authorship attribution and verification with many authors and limited data [O] . Kim Luyckx, Walter Daelemans 2013

机译：2008）作者身份归属和验证与许多作者和有限的数据

Authorship Attribution and Verification with Many Authors and Limited Data

摘要

著录项

相似文献

相关主题

期刊订阅