首页> 外文会议>International conference on computational linguistics >Cross-Topic Authorship Attribution: Will Out-Of-Topic Data Help?
【24h】

Cross-Topic Authorship Attribution: Will Out-Of-Topic Data Help?

机译:跨主题作者归属:将超出主题数据帮助吗?

获取原文

摘要

Most previous research on authorship attribution (AA) assumes that the training and test data are drawn from same distribution. But in real scenarios, this assumption is too strong. The goal of this study is to improve the prediction results in cross-topic AA (CTAA), where the training data comes from one topic but the test data comes from another. Our proposed idea is to build a predictive model for one topic using documents from all other available topics. In addition to improving the performance of CTAA, we also make a thorough analysis of the sensitivity to changes in topic of four most commonly used feature types in AA. We empirically illustrate that our proposed framework is significantly better than the one trained on a single out-of-domain topic and is as effective, in some cases, as same-topic setting.
机译:最先前关于作者署的研究(AA)假定培训和测试数据是从相同的分布中汲取的。 但在真实的场景中,这个假设太强烈了。 本研究的目标是提高跨主题AA(CTAA)的预测结果,其中培训数据来自一个主题,但测试数据来自另一个主题。 我们所提出的想法是使用来自所有其他可用主题的文档来构建一个主题的预测模型。 除了提高CTAA的性能外,我们还对AA中四个最常用的特征类型的主题变化的敏感性进行了全面的分析。 我们经常说明我们所提出的框架明显优于在一个域名探测器上训练的框架,并且在某些情况下与同一主题设置有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号