首页> 外文期刊>Pattern Analysis and Applications >Paraphrase plagiarism identification with character-level features
【24h】

Paraphrase plagiarism identification with character-level features

机译:用性格级别识别抄袭抄袭识别

获取原文
获取原文并翻译 | 示例

摘要

Several methods have been proposed for determining plagiarism between pairs of sentences, passages or even full documents. However, the majority of these methods fail to reliably detect paraphrase plagiarism due to the high complexity of the task, even for human beings. Paraphrase plagiarism identification consists in automatically recognizing document fragments that contain reused text, which is intentionally hidden by means of some rewording practices such as semantic equivalences, discursive changes and morphological or lexical substitutions. Our main hypothesis establishes that the original author's writing style fingerprint prevails in the plagiarized text even when paraphrases occur. Thus, in this paper we propose a novel text representation scheme that gathers both content and style characteristics of texts, represented by means of character-level features. As an additional contribution, we describe the methodology followed for the construction of an appropriate corpus for the task of paraphrase plagiarism identification, which represents a new valuable resource to the NLP community for future research work in this field.
机译:已经提出了几种方法来确定句子对,段落或甚至是全文件之间的抄袭。然而,由于任务的高度复杂性,即使对于人类,大多数这些方法也无法可靠地检测解释蛋白。释放抄袭识别在于自动识别包含重复使用的文档片段,这些片段通过一些重写实践,例如语义等效性,话语变化和形态或词汇替换。我们的主要假设建立了原作者的书写风格指纹,即使发生释放,抄袭文本也在抄袭中占有平。因此,在本文中,我们提出了一种新颖的文本表示方案,它通过字符级别特征表示文本的内容和样式特征。作为额外的贡献,我们描述了为建造适当的抄袭识别任务的方法,这代表了NLP社区的新宝贵资源,为该领域的未来研究工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号