首页> 外文会议>IEEE International Symposium on Computational Intelligence and Informatics >Selective chunking — Easy and effective way to estimate text similarity
【24h】

Selective chunking — Easy and effective way to estimate text similarity

机译:选择性分块—估算文本相似性的简便有效方法

获取原文

摘要

Plagiarism is a serious problem especially in academic environment. Basically we define this problem as a theft of stealing somebody else's work or ideas. In this paper we focus on plagiarism in a domain of student assignments written in natural language. We propose an approach that should faster and better identify copied fragments of text data than standard approaches. We first identify topic related pairs of text documents and then select those pairs on further processing that discuss similar topic. We experimented with usage of different chunking methods in the comparison process to overcome typical problems as shorter fragments of text copied from other documents. The results show that our approach is more suitable for plagiarism detection as a standard n-gram method.
机译:抄袭是一个严重的问题,尤其是在学术环境中。基本上,我们将这个问题定义为盗窃他人的作品或想法。在本文中,我们将重点放在以自然语言编写的学生作业领域中的窃。我们提出一种方法,该方法应比标准方法更快,更好地识别文本数据的复制片段。我们首先确定与主题相关的文本文档对,然后在讨论相似主题的进一步处理中选择这些对。我们在比较过程中尝试了使用不同的分块方法,以克服典型问题,即从其他文档中复制的较短文本片段。结果表明,我们的方法更适合作为标准n-gram方法进行窃检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号