首页> 外文会议>International Conference on Computer Science and Network Technology >Comparisons of Keyphrase Extraction Methods in Source Retrieval of Plagiarism Detection
【24h】

Comparisons of Keyphrase Extraction Methods in Source Retrieval of Plagiarism Detection

机译:基关节酶提取方法对抄袭检测源检索的比较

获取原文

摘要

In the processing of source retrieval in plagiarism detection, rationale for keywords extraction is to select only those phrases or words which maximize the chance of retrieving source documents matching the suspicious document. TF-IDF (term frequency-inverse document frequency), weighted TF-IDF (the weighted term frequency-inverse document frequency, namely, the TF-IDF of a term with a different coefficient in different positions), TF-IDF based on passages and Weighted TF-IDF based on passages have been used as keywords extraction methods in source retrieval of plagiarism detection in several previous researches. According to the previous researches, TF-IDF based on full document and weighted TF-IDF could get the higher performance. However, our experiments show that the same keywords extraction method for different types of plagiarism can get the different retrieval results and the different methods for the same type of plagiarism could achieve the significantly different results. In this study, we carry out more experiments on the above methods. All comparisons experiments are implemented by using vector space model. Experimental results show that TF-IDF based on passages is the best choice.
机译:在抄袭检测中的源检索的处理中,关键字提取的基本原理是仅选择这些短语或单词,这些短语或单词最大化检索与可疑文档匹配的源文档的机会。 TF-IDF(术语频率逆文档频率),加权TF-IDF(加权术语频率 - 逆文档频率,即术语的TF-IDF,具有不同位置的系数不同的系数),TF-IDF基于通道基于段落的加权TF-IDF已被用作在几个先前研究中的抄袭检测的​​关键字提取方法。根据以前的研究,基于完整文档和加权TF-IDF的TF-IDF可以获得更高的性能。然而,我们的实验表明,不同类型的抄袭的关键词提取方法可以获得不同的检索结果,并且相同类型的抄袭方法可以实现显着不同的结果。在这项研究中,我们对上述方法进行了更多的实验。所有比较实验都是通过使用矢量空间模型来实现的。实验结果表明,基于段落的TF-IDF是最佳选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号