首页> 外文期刊>Knowledge-Based Systems >Methods for cross-language plagiarism detection
【24h】

Methods for cross-language plagiarism detection

机译:跨语言窃检测方法

获取原文
获取原文并翻译 | 示例

摘要

Three reasons make plagiarism across languages to be on the rise: (ⅰ) speakers of under-resourced languages often consult documentation in a foreign language, (ⅱ) people immersed in a foreign country can still consult material written in their native language, and (ⅲ) people are often interested in writing in a language different to their native one. Most efforts for automatically detecting cross-language plagiarism depend on a preliminary translation, which is not always available. In this paper we propose a freely available architecture for plagiarism detection across languages covering the entire process: heuristic retrieval, detailed analysis, and post-processing. On top of this architecture we explore the suitability of three cross-language similarity estimation models: Cross-Language Alignment-based Similarity Analysis (CL-ASA), Cross-Language Character n-Grams (CL-CNG), and Translation plus Monolingual Analysis (T + MA); three inherently different models in nature and required resources. The three models are tested extensively under the same conditions on the different plagiarism detection sub-tasks-something never done before. The experiments show that T + MA produces the best results, closely followed by CL-ASA. Still CL-ASA obtains higher values of precision, an important factor in plagiarism detection when lesser user intervention is desired.
机译:造成跨语言窃的原因有三点:(ⅰ)资源匮乏的语言使用者经常查阅外语文档;(ⅱ)沉浸在国外的人们仍然可以查阅以其母语编写的材料;以及( ⅲ)人们通常会对用不同于母语的语言进行写作感兴趣。自动检测跨语言窃的大多数工作都依赖于初步翻译,而这种翻译并不总是可用。在本文中,我们提出了一种免费的体系结构,用于跨语言的窃检测,涵盖了整个过程:启发式检索,详细分析和后处理。在此体系结构之上,我们探讨了三种跨语言相似性估计模型的适用性:基于跨语言对齐的相似性分析(CL-ASA),跨语言字符n语法(CL-CNG)以及翻译加单语言分析(T + MA);本质上和所需资源方面存在三种本质上不同的模型。这三种模型在相同的条件下针对不同的窃检测子任务进行了广泛测试,这是以前从未做过的。实验表明,T + MA产生最佳结果,紧随其后的是CL-ASA。当需要较少的用户干预时,CL-ASA仍然可以获得较高的精度值,这是窃检测中的重要因素。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号