首页> 外文会议>Advances in Natural Language Processing >A Web-Based Self-training Approach for Authorship Attribution
【24h】

A Web-Based Self-training Approach for Authorship Attribution

机译:基于Web的作者身份归因自我训练方法

获取原文
获取原文并翻译 | 示例

摘要

As any other text categorization task, authorship attribution requires a large number of training examples. These examples, which are easily obtained for most of the tasks, are particularly difficult to obtain for this case. Based on this fact, in this paper we investigate the possibility of using Web-based text mining methods for the identification of the author of a given poem. In particular, we propose a semi-supervised method that is specially suited to work with just few training examples in order to tackle the problem of the lack of data with the same writing style. The method considers the automatic extraction of the unlabeled examples from the Web and its iterative integration into the training data set. To the knowledge of the authors, a semi-supervised method which makes use of the Web as support lexical resource has not been previously employed in this task. The results obtained on poem categorization show that this method may improve the classification accuracy and it is appropriate to handle the attribution of short documents.
机译:与其他任何文本分类任务一样,作者身份归属需要大量的培训示例。这些示例对于大多数任务而言都是很容易获得的,但在这种情况下尤其难以获得。基于这一事实,在本文中,我们研究了使用基于Web的文本挖掘方法来识别给定诗歌作者的可能性。特别是,我们提出了一种半监督方法,该方法特别适用于仅需少量培训示例的情况,以解决具有相同书写风格的数据不足的问题。该方法考虑了从Web自动提取未标记示例并将其迭代集成到训练数据集中的方法。据作者所知,此任务以前未采用将Web用作支持词汇资源的半监督方法。诗歌分类结果表明,该方法可以提高分类的准确性,适合处理短文档的归因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号