首页> 外文会议>International conference on computational linguistics >Cross-Topic Authorship Attribution: Will Out-Of-Topic Data Help?
【24h】

Cross-Topic Authorship Attribution: Will Out-Of-Topic Data Help?

机译:跨主题作者身份归属:主题外数据会有所帮助吗?

获取原文

摘要

Most previous research on authorship attribution (AA) assumes that the training and test data are drawn from same distribution. But in real scenarios, this assumption is too strong. The goal of this study is to improve the prediction results in cross-topic AA (CTAA), where the training data comes from one topic but the test data comes from another. Our proposed idea is to build a predictive model for one topic using documents from all other available topics. In addition to improving the performance of CTAA, we also make a thorough analysis of the sensitivity to changes in topic of four most commonly used feature types in AA. We empirically illustrate that our proposed framework is significantly better than the one trained on a single out-of-domain topic and is as effective, in some cases, as same-topic setting.
机译:以前有关作者身份归因(AA)的大多数研究都假设培训和测试数据来自同一分布。但在实际情况下,此假设太过严格。这项研究的目的是改善跨主题AA(CTAA)的预测结果,其中训练数据来自一个主题,而测试数据则来自另一个主题。我们提出的想法是使用来自所有其他可用主题的文档为一个主题建立预测模型。除了提高CTAA的性能外,我们还对AA中四种最常用的特征类型对主题变化的敏感性进行了透彻的分析。我们从经验上说明,我们提出的框架比在单个域外主题上训练的框架要好得多,并且在某些情况下与同主题设置一样有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号