Not All Character N-grams Are Created Equal: A Study in Authorship Attribution

机译：并非所有字符N-gram都相等：作者身份归因研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Character n-grams have been identified as the most successful feature in both single-domain and cross-domain Authorship Attribution (AA), but the reasons for their discriminative value were not fully understood. We identify subgroups of character n-grams that correspond to linguistic aspects commonly claimed to be covered by these features: morpho-syntax, thematic content and style. We evaluate the predictiveness of each of these groups in two AA settings: a single domain setting and a cross-domain setting where multiple topics are present. We demonstrate that character n-grams that capture information about affixes and punctuation account for almost all of the power of character n-grams as features. Our study contributes new insights into the use of n-grams for future AA work and other classification tasks.

机译：在单域和跨域作者身份归因（AA）中，字符n-gram已被识别为最成功的功能，但其判别价值的原因尚未完全弄清。我们确定字符n-元组的子组，这些子组与通常声称由以下特征覆盖的语言方面相对应：词法语法，主题内容和样式。我们在两个AA设置中评估这些组中每个组的可预测性：单个域设置和存在多个主题的跨域设置。我们证明了字符n-gram捕获了有关词缀和标点的信息，几乎可以解释字符n-gram作为特征的所有功能。我们的研究为将来的AA工作和其他分类任务中使用n-gram提供了新的见解。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2015年|93-102|共10页
会议地点
作者
Upendra Sapkota; Steven Bethard; Manuel Montes-y-Gomez; Thamar Solorio;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Authorship attribution of Spanish poems using n-grams and the Web as Corpus [J] . Guzman-Cabrera Rafael Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2020,第2Pta2期

机译：西班牙诗歌用n-gram和web作为语料库的作者归属
2. Document embeddings learned on various types of n-grams for cross-topic authorship attribution [J] . Helena Gómez-Adorno, Juan-Pablo Posadas-Durán, Grigori Sidorov, Computing . 2018,第7期

机译：在各种类型的n-gram上学习的文档嵌入，以实现跨主题作者的归属
3. Not All Fairness Is Created Equal: A Study of Employee Attributions of Supervisor Justice Motives [J] . Matta Fadel K., Sabey Tyler B., Scott Brent A., Journal of Applied Psychology . 2020,第3期

机译：并非所有公平都是平等的：研究主管司法动机的员工归属
4. Not All Character N-grams Are Created Equal: A Study in Authorship Attribution [C] . Upendra Sapkota, Steven Bethard, Manuel Montes-y-Gomez, Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2015

机译：并非所有字符n-gram都是平等的：作者归因的研究
5. "EGILS SAGA" AND SNORRI STURLUSON: A STATISTICAL AUTHORSHIP ATTRIBUTION STUDY. [D] . WEST, RALPH ALLEN. 1974

机译：“ EGILS SAGA”和SNORRI STURLUSON：统计授权归属研究。
6. Authorship versus Credit for Participation in Research: A Case Study of Potential Ethical Dilemmas Created by Technical Tools Used by Researchers and Claims for Authorship by Their Creators [O] . James A. Welker, Jack D. McCue 2007

机译：作者身份与信用参与研究：一个由研究人员使用的技术工具创建的潜在道德困境和由其创作者主张版权的案例研究
7. Continuous N-gram Representations for Authorship Attribution [O] . Sari Y., Vlachos A., Stevenson R.M. 2017

机译：作者归因的连续N-gram表示

Not All Character N-grams Are Created Equal: A Study in Authorship Attribution

摘要

著录项

相似文献

相关主题

期刊订阅