【24h】

Using Sentence Embedding for Cross-Language Plagiarism Detection

机译:使用句子嵌入用于跨语言抄袭检测

获取原文

摘要

The growth of textual content in various languages and the advancement of automatic translation systems has led to an increase of cases of translated plagiarism. When a text is translated into another language, word order will change and words may be substituted by synonyms, and as a result detection will be more challenging. The purpose of this paper is to introduce a new technique for English-Arabic cross-language plagiarism detection. This method combines word embedding, term weighting techniques, and universal sentence encoder models, in order to improve detection of sentence similarity. The proposed model has been evaluated based on English-Arabic cross-lingual datasets, and experimental results show improved performance when compared with other Arabic-English cross-lingual evaluation methods presented at SemEval-2017.
机译:各种语言的文本内容的增长以及自动翻译系统的进步导致了翻译抄袭案例的增加。 当文本被翻译成另一种语言时,Word顺序将改变,单词可能被同义词代替,结果检测将更具挑战性。 本文的目的是为英语 - 阿拉伯语跨语言抄袭检测引入新技术。 该方法组合了Word嵌入,术语加权技术和通用句子编码器模型,以改善句子相似度的检测。 拟议的模型已经基于英语 - 阿拉伯语交叉语言数据集进行了评估,与在2017年Semeval-2017上呈现的其他阿拉伯语英语交叉评估方法相比,实验结果显示出改善的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号