首页> 外文期刊>Journal of the American Society for Information Science and Technology >Masking Topic-Related Information to Enhance Authorship Attribution
【24h】

Masking Topic-Related Information to Enhance Authorship Attribution

机译:屏蔽与主题相关的信息以增强作者的归属

获取原文
获取原文并翻译 | 示例
           

摘要

Authorship attribution attempts to reveal the authors of documents. In recent years, research in this field has grown rapidly. However, the performance of state-of-the-art methods is heavily affected when text of known authorship and texts under investigation differ in topic and/or genre. So far, it is not clear how to quantify the personal style of authors in a way that is not affected by topic shifts or genre variations. In this paper, a set of text distortion methods are used attempting to mask topic-related information. These methods transform the input texts into a more topic-neutral form while maintaining the structure of documents associated with the personal style of the author. Using a controlled corpus that includes a fine-grained range of topics and genres it is demonstrated how the proposed approach can be combined with existing authorship attribution methods to enhance their performance in very challenging tasks, especially in cross-topic attribution. We also examine cross-genre attribution and the most challenging, yet realistic, cross-topic-and-genre attribution scenarios and show how the proposed techniques should be tuned to enhance performance in these tasks. Finally, we demonstrate that there are important differences in attribution effectiveness when either conversational genres, nonconversational genres, or a mix of them are considered.
机译:作者身份归因试图揭示文档的作者。近年来,该领域的研究发展迅速。但是,当已知作者的文本和所研究的文本的主题和/或体裁不同时,最新方法的性能将受到严重影响。到目前为止,尚不清楚如何以不受主题变化或体裁变化影响的方式量化作者的个人风格。在本文中,尝试使用一组文本失真方法来掩盖与主题相关的信息。这些方法将输入文本转换为与主题无关的形式,同时保持与作者个人风格相关的文档结构。使用包含主题和体裁的细粒度范围的受控语料库,可以证明所提出的方法如何与现有的作者身份归属方法相结合,以增强其在极具挑战性的任务中的表现,尤其是在跨主题归属方面。我们还将研究跨类型的归因和最具挑战性,但最现实的跨主题和类型的归因场景,并说明应如何调整提议的技术以增强这些任务的性能。最后,我们证明了在考虑会话体裁,非会话体裁或混合体裁时,归因效率存在重要差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号