首页> 外文期刊>Knowledge-Based Systems >Authorship identification from unstructured texts
【24h】

Authorship identification from unstructured texts

机译:非结构化文本的作者身份识别

获取原文
获取原文并翻译 | 示例

摘要

Authorship identification is a task of identifying authors of anonymous texts given examples of the writing of authors. The increasingly large volumes of anonymous texts on the Internet enhance the great yet urgent necessity for authorship identification. It has been applied to more and more practical applications including literary works, intelligence, criminal law, civil law, and computer forensics. In this paper, we propose a semantic association model about voice, word dependency relations, and non-subject stylistic words to represent the writing style of unstructured texts of various authors, design an unsupervised approach to extract stylistic features, and employ principal components analysis and linear discriminant analysis to identify authorship of texts. This paper provides a uniform quantified method to capture syntactic and semantic stylistic characteristics of and between words and phrases, and this approach can solve the problem of the independence of different dimensions to some extent. Experimental results on two English text corpora show that our approach significantly improves the overall performance over authorship identification.
机译:作者身份识别是一项根据作者的写作示例来识别匿名文本作者的任务。互联网上越来越多的匿名文本增加了作者身份识别的迫切需求。它已被应用于越来越多的实际应用,包括文学作品,情报,刑法,民法和计算机取证。在本文中,我们提出了一种关于语音,单词依赖关系和非主题风格词的语义关联模型,以表示各种作者的非结构化文本的写作风格,设计了一种无监督的方法来提取风格特征,并采用主成分分析和线性判别分析,以识别文本的作者身份。本文提供了一种统一的量化方法来捕获单词和短语之间以及它们之间的句法和语义风格特征,这种方法可以在一定程度上解决不同维度的独立性问题。在两个英文文本语料库上的实验结果表明,与作者身份识别相比,我们的方法显着提高了整体性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号