首页> 外文会议>International conference on advances in computing, communications and informatics >Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system
【24h】

Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system

机译:研究组合相似性度量标准和POS标记在外部文本抄袭检测系统中的影响

获取原文

摘要

Plagiarism is an illicit act which has become a prime concern mainly in educational and research domains. This deceitful act is usually referred as an intellectual theft which has swiftly increased with the rapid technological developments and information accessibility. Thus the need for a system/ mechanism for efficient plagiarism detection is at its urgency. In this paper, an investigation of different combined similarity metrics for extrinsic plagiarism detection is done and it focuses on unfolding the importance of combined similarity metrics over the commonly used single metric usage in plagiarism detection task. Further the impact of utilizing part of speech tagging (POS) in the plagiarism detection model is analyzed. Different combinations of the four single metrics, Cosine similarity, Dice coefficient, Match coefficient and Fuzzy-Semantic measure is used with and without POS tag information. These systems are evaluated using PAN -2014 training and test data set and results are analyzed and compared using standard PAN measures, viz, recall, precision, granularity and plagdet_score.
机译:gi窃是一种非法行为,主要在教育和研究领域已成为首要关注的问题。这种欺诈行为通常被称为智力盗窃,随着技术的迅猛发展和信息的可获取性而迅速增加。因此,迫切需要用于有效窃检测的系统/机制。在本文中,完成了针对外部抄袭检测的​​不同组合相似性度量标准的研究,其重点是在combined窃检测任务中,相对于通常使用的单个度量标准而言,揭示了组合相似性度量标准的重要性。进一步分析了在part窃检测模型中使用部分语音标记(POS)的影响。在包含和不包含POS标签信息的情况下,使用四个单一指标(余弦相似度,骰子系数,匹配系数和模糊语义度量)的不同组合。这些系统使用PAN -2014培训和测试数据集进行评估,并使用标准PAN度量(即,召回率,精度,粒度和plagdet_score)对结果进行分析和比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号