首页> 外文期刊>Expert Systems with Application >Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style
【24h】

Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style

机译:文本挖掘应用于窃检测:使用单词来检测写作风格中的偏差

获取原文
获取原文并翻译 | 示例

摘要

Plagiarism detection is of special interest to educational institutions, and with the proliferation of digital documents on the Web the use of computational systems for such a task has become important. While traditional methods for automatic detection of plagiarism compute the similarity measures on a document-to-document basis, this is not always possible since the potential source documents are not always available. We do text mining, exploring the use of words as a linguistic feature for analyzing a document by modeling the writing style present in it. The main goal is to discover deviations in the style, looking for segments of the document that could have been written by another person. This can be considered as a classification problem using self-based information where paragraphs with significant deviations in style are treated as outliers. This so-called intrinsic plagiarism detection approach does not need comparison against possible sources at all, and our model relies only on the use of words, so it is not language specific. We demonstrate that this feature shows promise in this area, achieving reasonable results compared to benchmark models.
机译:gi窃检测是教育机构特别感兴趣的,并且随着Web上数字文档的激增,使用计算机系统来完成这项任务变得非常重要。传统的for窃自动检测方法会在逐个文档的基础上计算相似性度量,但这并不总是可能的,因为潜在的原始文档并不总是可用。我们进行文本挖掘,通过对文档中存在的书写样式进行建模,探索将单词作为一种语言功能来分析文档的方法。主要目标是发现样式中的偏差,寻找可能由其他人编写的文档片段。可以将其视为使用基于自我的信息的分类问题,其中具有明显风格差异的段落被视为异常值。这种所谓的内在gi窃检测方法根本不需要与可能的来源进行比较,并且我们的模型仅依赖于单词的使用,因此它不是特定于语言的。我们证明了此功能在该领域显示出了希望,与基准模型相比可实现合理的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号