首页> 外文OA文献 >UPPC - Urdu Paraphrase Plagiarism Corpus
【2h】

UPPC - Urdu Paraphrase Plagiarism Corpus

机译:UPPC-乌尔都语释义Pla窃语料库

摘要

Paraphrase plagiarism is a significant and widespread problem and research shows that it is hard to detect. Several methods and automatic systems have been proposed to deal with it. However, evaluation and comparison of such solutions is not possible because of the unavailability of benchmark corpora with manual examples of paraphrase plagiarism. To deal with this issue, we present the novel development of a paraphrase plagiarism corpus containing simulated (manually created) examples in the Urdu language - a language widely spoken around the world. This resource is the first of its kind developed for the Urdu language and we believe that it will be a valuable contribution to the evaluation of paraphrase plagiarism detection systems.
机译:复述窃是一个重要且广泛存在的问题,研究表明它很难被发现。已经提出了几种方法和自动系统来处理它。但是,由于无法使用基准语料库和解释短语窃的手动示例,因此无法评估和比较此类解决方案。为了解决这个问题,我们介绍了复述窃语料库的新颖发展,该语料库包含乌尔都语(一种在世界范围内广泛使用的语言)的模拟(手动创建)示例。此资源是针对乌尔都语开发的首个此类资源,我们相信它将为评估释义窃检测系统做出宝贵的贡献。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号