首页> 外文期刊>Knowledge-Based Systems >Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language
【24h】

Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language

机译:基于连续空间和知识图的语言表示形式的跨语言窃检测

获取原文
获取原文并翻译 | 示例

摘要

Cross-language (CL) plagiarism detection aims at detecting plagiarised fragments of text among documents in different languages. The main research question of this work is on whether knowledge graph representations and continuous space representations can complement to each other and improve the state-of-the-art performance in CL plagiarism detection methods. In this sense, we propose and evaluate hybrid models to assess the semantic similarity of two segments of text in different languages. The proposed hybrid models combine knowledge graph representations with continuous space representations aiming at exploiting their complementarity in capturing different aspects of cross -lingual similarity. We also present the continuous word alignment-based similarity analysis, a new model to estimate similarity between text fragments. We compare the aforementioned approaches with several state-of-the-art models in the task of CL plagiarism detection and study their performance in detecting different length and obfuscation types of plagiarism cases. We conduct experiments over Spanish-English and German English datasets. Experimental results show that continuous representations allow the continuous word alignment-based similarity analysis model to obtain competitive results and the knowledge-based document similarity model to outperform the state-of-the-art in CL plagiarism detection. (C) 2016 Elsevier B.V. All rights reserved.
机译:跨语言(CL)抄袭检测旨在检测不同语言文档之间抄袭的文本片段。这项工作的主要研究问题是知识图谱表示法和连续空间表示法是否可以相互补充,并提高CL CL窃检测方法的最新性能。从这个意义上讲,我们提出并评估了混合模型,以评估不同语言的两个文本段的语义相似性。提出的混合模型将知识图表示与连续空间表示相结合,旨在利用它们的互补性来捕获跨语言相似性的不同方面。我们还提出了基于连续单词对齐的相似度分析,这是一种估计文本片段之间相似度的新模型。我们将上述方法与几种最新模型进行CL抄袭检测的​​任务进行了比较,并研究了它们在检测不同长度和混淆类型的抄袭案件中的表现。我们对西班牙语-英语和德语英语数据集进行实验。实验结果表明,连续表示允许基于连续单词对齐的相似性分析模型获得竞争性结果,而基于知识的文档相似性模型则优于CL抄袭检测中的最新技术。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号