首页> 外文期刊>International Journal of Innovative Computing Information and Control >INDONESIAN TEXT DOCUMENT SIMILARITY DETECTION SYSTEM USING RABIN-KARP AND CONFIX-STRIPPING ALGORITHMS
【24h】

INDONESIAN TEXT DOCUMENT SIMILARITY DETECTION SYSTEM USING RABIN-KARP AND CONFIX-STRIPPING ALGORITHMS

机译:基于RABIN-KARP和小量带算法的印尼文本文档相似度检测系统

获取原文
获取原文并翻译 | 示例
           

摘要

Nowadays, negative impact, such as plagiarism, may arise along with fasterand easier ways in finding information. There are many software and websites that can beused to check the occurrence of plagiarism, but unfortunately, they are not really suitablefor scientific papers which are written in Bahasa Indonesia because it is designed for textin English. Therefore, a document similarity detection system that is more suitable forpapers written in Bahasa Indonesia is needed. Rabin-Karp is an algorithm that can beused in checking the similarity between documents, while Con fix-Stripping is an algorithmthat can perform basic word seo/rch in Bahasa Indonesia. This research has successfullyimplemented Rabin-Karp and Confix-Stripping algorithms very well. Tests performedwith various document scenarios as well as algorithms have given some performanceresults of the system in terms of time and similarity level. The system with the pureRabin-Karp can provide the best system performance, both in terms of time and accuracywith an average total processing time speed of 0.0123 second and the average similarityrate of 89.1967%. The accuracy level given by the system is 0.7. The system that hasbeen added with a stemming process or N-Gram can also improve some test results interms of processing time and similarity level.
机译:如今,诸如抄袭之类的负面影响以及更快,更轻松的信息查找方法都可能出现。有许多软件和网站可用于检查窃的发生,但不幸的是,它们并不是真正适用于以印度尼西亚语撰写的科学论文,因为它是为英语文本设计的。因此,需要一种文档相似度检测系统,该系统更适合印度尼西亚语编写的纸张。 Rabin-Karp是一种可用于检查文档之间相似性的算法,而Con fix-Stripping是一种可在印度尼西亚语中执行基本单词seo / rch的算法。这项研究很好地成功实现了Rabin-Karp和Confix-Stripping算法。使用各种文档方案以及算法进行的测试在时间和相似性级别方面给了系统一些性能结果。使用pureRabin-Karp的系统可以提供最佳的系统性能,无论是时间还是准确性,平均总处理时间为0.0123秒,平均相似率为89.1967%。系统给出的准确度为0.7。添加了词干处理或N-Gram的系统还可以改善处理时间和相似性水平方面的一些测试结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号