...
首页> 外文期刊>Journal of Information Science >Cross-lingual text alignment for fine-grained plagiarism detection
【24h】

Cross-lingual text alignment for fine-grained plagiarism detection

机译:细粒度抄袭检测的​​交叉语言对齐

获取原文
获取原文并翻译 | 示例

摘要

Fast and easy access to a wide range of documents in various languages, in conjunction with the wide availability of translation and editing tools, has led to the need to develop effective tools for detecting cross-lingual plagiarism. Given a suspicious document, cross-lingual plagiarism detection comprises two main subtasks: retrieving documents that are candidate sources for that document and analysing those candidates one by one to determine their similarity to the suspicious document. In this article, we examine the second subtask, also called the detailed analysis subtask, where the goal is to align plagiarised fragments from source and suspicious documents in different languages. Our proposed approach has two main steps: the first step tries to find candidate plagiarised fragments and focuses on high recall, followed by a more precise similarity analysis based on dynamic text alignment that will filter the results by finding alignments between the identified fragments. With these two steps, the proximity of the terms will be considered in different levels of granularity. In both steps, our approach uses a dictionary to obtain translations of individual terms instead of using a machine translation system to convert longer passages from one language to another. We used a weighting scheme to distinct multiple translations of the terms. Experimental results show that our method outperforms the methods used by the systems that achieved the best results in the PAN-2012 and PAN-2014 competitions.
机译:快速轻松地访问各种语言的广泛文档,与翻译和编辑工具的广泛可用性相结合,导致需要开发有效的检测交叉抄袭的工具。鉴于可疑文档,交叉语言抄袭检测包括两个主要的子任务:检索该文档的候选源的文档,并一个接一个地分析那些候选者,以确定它们与可疑文件的相似性。在本文中,我们检查第二个子任务,也称为详细分析子任务,目标是将来自不同语言的源和可疑文件的抄袭片段对齐。我们所提出的方法有两个主要步骤:第一步试图找到候选抄袭碎片并侧重于高召回,然后基于动态文本对齐进行更精确的相似性分析,这将通过查找所识别的片段之间的对齐来过滤结果。通过这两个步骤,术语的附近将以不同的粒度视为。在这两个步骤中,我们的方法使用字典来获取个人术语的翻译而不是使用机器翻译系统将来自一种语言的更长的段落转换为另一语言。我们使用加权方案来不同的术语翻译。实验结果表明,我们的方法优于实现达到泛 - 2014年持续效果的系统使用的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号