Selective chunking — Easy and effective way to estimate text similarity

机译：选择性分块—估算文本相似性的简便有效方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Plagiarism is a serious problem especially in academic environment. Basically we define this problem as a theft of stealing somebody else's work or ideas. In this paper we focus on plagiarism in a domain of student assignments written in natural language. We propose an approach that should faster and better identify copied fragments of text data than standard approaches. We first identify topic related pairs of text documents and then select those pairs on further processing that discuss similar topic. We experimented with usage of different chunking methods in the comparison process to overcome typical problems as shorter fragments of text copied from other documents. The results show that our approach is more suitable for plagiarism detection as a standard n-gram method.

机译：抄袭是一个严重的问题，尤其是在学术环境中。基本上，我们将这个问题定义为盗窃他人的作品或想法。在本文中，我们将重点放在以自然语言编写的学生作业领域中的窃。我们提出一种方法，该方法应比标准方法更快，更好地识别文本数据的复制片段。我们首先确定与主题相关的文本文档对，然后在讨论相似主题的进一步处理中选择这些对。我们在比较过程中尝试了使用不同的分块方法，以克服典型问题，即从其他文档中复制的较短文本片段。结果表明，我们的方法更适合作为标准n-gram方法进行窃检测。

著录项

来源
《IEEE International Symposium on Computational Intelligence and Informatics》|2013年|381-385|共5页
会议地点
作者
Kucecka Tomas; Chuda Daniela; Samuhel Patrik;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. EFFECTIVE SEMANTIC TEXT SIMILARITY METRIC USING NORMALIZED ROOT MEAN SCALED SQUARE ERROR [J] . ISSA ATOUM, MARUTHI ROHIT AYYAGARI Journal of Theoretical and Applied Information Technology . 2019,第12期

机译：使用归一化均方根平方误差的有效语义文本相似度度量
2. Estimating the whole-body effective dose and health risks as well as introducing a new easy method for eye lens dosimetry in interventional cardiology procedures [J] . Alireza Hatami, Mahmoud Bagheri, Farzaneh Falahati, MethodsX . 2020,第1期

机译：估计全身有效剂量和健康风险，并在介入心脏病学过程中引入了一种新的眼镜剂量测定方法
3. Selective word encoding for effective text representation [J] . SAVA? ?ZKAN, AKIN ?ZKAN Turkish Journal of Electrical Engineering and Computer Sciences . 2019,第2期

机译：选择性的字编码，可有效表达文字
4. Selective chunking — Easy and effective way to estimate text similarity [C] . Kucecka Tomas, Chuda Daniela, Samuhel Patrik IEEE International Symposium on Computational Intelligence and Informatics . 2013

机译：选择性块 - 估计文本相似的简单有效方法
5. Advanced techniques for Chinese chunk segmentation and the similarity measure of Chinese sentences. [D] . Wang, Rongbo. 2006

机译：汉语大块分割的高级技术和汉语句子的相似度度量。
6. Estimating the whole-body effective dose and health risks as well as introducing a new easy method for eye lens dosimetry in interventional cardiology procedures [O] . Alireza Hatami, Mahmoud Bagheri, Farzaneh Falahati, 2020

机译：估计全身有效剂量和健康风险并在介入心脏病学过程中引入了一种新的眼镜剂量测定方法
7. DTSim at SemEval-2016 Task 2: Interpreting Similarity of Texts Based on Automated Chunking, Chunk Alignment and Semantic Relation Prediction [O] . Rajendra Banjade, Nabin Maharjan, Nobal Bikram Niraula, 2016

机译：DTSIM在Semeval-2016任务2：根据自动截面，块对齐和语义关系预测来解释文本的相似性

Selective chunking — Easy and effective way to estimate text similarity

摘要

著录项

相似文献

相关主题

期刊订阅