Cross-Topic Authorship Attribution: Will Out-Of-Topic Data Help?

机译：跨主题作者身份归属：主题外数据会有所帮助吗？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most previous research on authorship attribution (AA) assumes that the training and test data are drawn from same distribution. But in real scenarios, this assumption is too strong. The goal of this study is to improve the prediction results in cross-topic AA (CTAA), where the training data comes from one topic but the test data comes from another. Our proposed idea is to build a predictive model for one topic using documents from all other available topics. In addition to improving the performance of CTAA, we also make a thorough analysis of the sensitivity to changes in topic of four most commonly used feature types in AA. We empirically illustrate that our proposed framework is significantly better than the one trained on a single out-of-domain topic and is as effective, in some cases, as same-topic setting.

机译：以前有关作者身份归因（AA）的大多数研究都假设培训和测试数据来自同一分布。但在实际情况下，此假设太过严格。这项研究的目的是改善跨主题AA（CTAA）的预测结果，其中训练数据来自一个主题，而测试数据则来自另一个主题。我们提出的想法是使用来自所有其他可用主题的文档为一个主题建立预测模型。除了提高CTAA的性能外，我们还对AA中四种最常用的特征类型对主题变化的敏感性进行了透彻的分析。我们从经验上说明，我们提出的框架比在单个域外主题上训练的框架要好得多，并且在某些情况下与同主题设置一样有效。

著录项

来源
《International conference on computational linguistics》|2014年|1228-1237|共10页
会议地点
作者
Upendra Sapkota; Thamar Solorio; Manuel Montes-y-Gomez; Steven Bethard; Paolo Rosso;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Document embeddings learned on various types of n-grams for cross-topic authorship attribution [J] . Helena Gómez-Adorno, Juan-Pablo Posadas-Durán, Grigori Sidorov, Computing . 2018,第7期

机译：在各种类型的n-gram上学习的文档嵌入，以实现跨主题作者的归属
2. Bucketed common vector scaling for authorship attribution in heterogeneous web collections: A scaling approach for authorship attribution [J] . Hayri Volkan Agun, Ozgur Yilmazel Journal of Information Science . 2020,第5期

机译：异构网络收藏中作者归属的跨普通矢量缩放：作者归因的缩放方法
3. A Two-Stage Authorship Attribution Method Using Text and Structured Data for De-Anonymizing User-Generated Content [J] . Matthew J. Schneider, Shawn Mankad Customer Needs and Solutions . 2021,第3期

机译：使用文本和结构化数据的两阶段作者属性方法，用于取消匿名用户生成的内容
4. Cross-Topic Authorship Attribution: Will Out-Of-Topic Data Help? [C] . Upendra Sapkota, Thamar Solorio, Manuel Montes-y-Gomez, International conference on computational linguistics . 2014

机译：跨主题作者归属：将超出主题数据帮助吗？
5. Network Data Analysis of Word Graphs with Applications to Authorship Attribution [D] . Leonard, Timothy. 2018

机译：词图的网络数据分析及其在作者归属中的应用
6. Authorship attribution of source code by using back propagation neural network based on particle swarm optimization [O] . Xinyu Yang, Guoai Xu, Qi Li, 2011

机译：基于粒子群算法的反向传播神经网络对源代码的作者归属
7. (A) Data in the Life: Authorship Attribution in Lennon-McCartney Songs [O] . Mark Glickman, Jason Brown, Ryan Song 2019

机译：（a）生活中的数据：Lennon-McCartney歌曲的作者归属

Cross-Topic Authorship Attribution: Will Out-Of-Topic Data Help?

摘要

著录项

相似文献

相关主题

期刊订阅