首页> 外文期刊>Journal of digital information management >A Fast Algorithm for Plagiarism Detection in Large-scale Data
【24h】

A Fast Algorithm for Plagiarism Detection in Large-scale Data

机译:大规模数据抄袭检测的​​快速算法

获取原文
获取原文并翻译 | 示例
       

摘要

This paper proposes a fast plagiarism detection algorithm in large-scale data. Plagiarisms of superficial descriptions, such as "copy and paste", can be detected using a simple document similarity based on string matching. The algorithm reduces the effort for computing the document similarity by approximating the similarity. The effects of the approximation on the processing time and accuracy are evaluated by conducting experiments with a data set generated from practical scholarly documents. The experimental results show that the algorithm based on the approximated similarity can reduce the processing time of the straightforward algorithm based on the exact similarity to less than one-third in exchange for a slight decrease of the accuracy.
机译:提出了一种大规模数据的快速窃检测算法。可以使用基于字符串匹配的简单文档相似性来检测表面描述的抄袭,例如“复制和粘贴”。该算法通过近似相似度来减少计算文档相似度的工作量。近似值对处理时间和准确度的影响是通过使用从实际学术文献生成的数据集进行实验来评估的。实验结果表明,基于近似相似度的算法可以将基于精确相似度的简单算法的处理时间减少到不到三分之一,以换取精度的轻微降低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号