首页> 外文期刊>Future generation computer systems >An integrated approach for intrinsic plagiarism detection
【24h】

An integrated approach for intrinsic plagiarism detection

机译:内在窃检测的集成方法

获取原文
获取原文并翻译 | 示例
       

摘要

Employing effective plagiarism detection methods are seen to be essential in the next generation web. In this paper, we present a novel approach for plagiarism detection without reference collections. The proposed approach relies on using some statistical properties of the most common words, and the Latent Semantic Analysis that is applied to extract the most common words usage patterns. This method aims to generate a model of author's "style" by revealing a set of certain features of authorship. The model generation procedure focuses on just one author, as an attempt to summarise the aspects of an author's style in a definitive and clear-cut manner. The feature set of the intrinsic model were based on the frequency of the most common words, their relative frequencies in the book series, and the deviation of these frequencies across all books for a particular author. The approach has been evaluated using the leave-one-out-cross-validation method on the CEN (Corpus of English Novel) data set. Results have indicated that, by integrating deep latent semantic and stylometric analyses, hidden changes can be identified when a reference collection does not exist. The results have also shown that our Multi-Layer Perceptron based approach statistically outperforms Bayesian Network, Support Vector Machine and Random Forest models, by accurately predicting the author classes with an overall accuracy of 97%. (C) 2017 Elsevier B.V. All rights reserved.
机译:在下一代网络中,采用有效的窃检测方法被视为必不可少的。在本文中,我们提出了一种无需参考文献收集的窃检测新方法。提出的方法依赖于使用最常用单词的一些统计属性,以及用于提取最常用单词使用模式的潜在语义分析。该方法旨在通过揭示作者身份的某些特征来生成作者的“风格”模型。模型生成过程仅针对一位作者,试图以一种明确,清晰的方式总结作者风格的各个方面。内在模型的功能集基于最常见单词的频率,在书系列中它们的相对频率以及特定作者在所有书籍中这些频率的偏差。该方法已使用CEN(英语小说公司)数据集上的留一法交叉验证方法进行了评估。结果表明,通过集成深入的潜在语义和风格分析,可以在不存在参考集合的情况下识别隐藏的更改。结果还表明,基于多层感知器的方法通过准确地预测作者类别(总体准确率为97%),在统计上优于贝叶斯网络,支持向量机和随机森林模型。 (C)2017 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Future generation computer systems》 |2019年第7期|700-712|共13页
  • 作者单位

    Coventry Univ, Fac Engn Environm & Comp, Sch Comp Elect & Maths, Coventry, W Midlands, England;

    Coventry Univ, Fac Engn Environm & Comp, Sch Comp Elect & Maths, Coventry, W Midlands, England;

    Coventry Univ, Fac Engn Environm & Comp, Sch Comp Elect & Maths, Coventry, W Midlands, England;

    Coventry Univ, Fac Engn Environm & Comp, Sch Comp Elect & Maths, Coventry, W Midlands, England;

    Xian Jiaotong Liverpool Univ, Int Business Sch Suzhou, Suzhou, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号