【24h】

Text Mining for Plagiarism Detection: Multivariate Pattern Detection for Recognition of Text Similarities

机译:抄袭检测的​​文本挖掘:用于识别文本相似性的多元模式检测

获取原文

摘要

The problem of plagiarism the recent years has been intensified by the availability of information in digital form and the accessibility of the electronic libraries through the Internet. As a result, plagiarism detection has been transformed into a big data analytics problem since the number of digital sources is extravagant and a new document needs to be compared with millions of other existing documents. In this paper, a text mining methodology is proposed that can detect all common patterns between a document and the documents in a reference database. The technique is based on a pattern detection algorithm and the corresponding data structure that enables the algorithm to detect all common patterns. The methodology has been applied in a well-defined dataset providing very promising results identifying difficult cases of plagiarism such as technical disguise.
机译:近年来,由于存在数字形式的信息以及电子图书馆可以通过互联网访问,窃问题变得更加严重。结果,since窃检测已转化为大数据分析问题,因为数字资源的数量非常庞大,并且需要将新文档与数百万其他现有文档进行比较。本文提出了一种文本挖掘方法,可以检测文档和参考数据库中文档之间的所有常见模式。该技术基于模式检测算法和使该算法能够检测所有常见模式的相应数据结构。该方法已应用于定义明确的数据集中,提供了非常有希望的结果,可识别出诸如技术伪装之类的difficult窃困难案例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号