首页> 外文会议>International Conference on Informatics, Engineering, Science and Technology >Comparison of document similarity measurements in scientific writing using Jaro-Winkler Distance method and Paragraph Vector method
【24h】

Comparison of document similarity measurements in scientific writing using Jaro-Winkler Distance method and Paragraph Vector method

机译:Jaro-Winkler距离方法和段矢量方法对科学写作文献相似度测量的比较

获取原文

摘要

The purpose of this research is to study the methods of measuring the similarity of documents and tell us which is the most suitable for Indonesian Scientific Writing. This research method used was Jaro-Winkler Distance as method. Jaro-Winkler is a method that calculates the distance between strings and then measures the similarity. Doc2Vec (Paragraph Vector) is a method that aims to represent documents in vector form for comparison with the machine learning process. The results of this study compare the results of plagiarism detection between the Jaro-Winkler Distance method and the Doc2Vec method. The best measurement comparison method used is the accuracy of the comparison of documents and their speed. Using the dataset created, Doc2Vec outperformed the Jaro-Winkler Distance algorithm in comparing document similarities. Therefore, the development of a document similarity method will be easier in the future by using Doc2Vec (Paragraph Vector) in Indonesian scientific works.
机译:本研究的目的是研究测量文件的相似性并告诉我们哪个是印度尼西亚科学写作最适合的方法。 这种研究方法是Jaro-Winkler作为方法。 Jaro-Winkler是一种计算字符串之间的距离,然后测量相似性的方法。 DOC2VEC(段落向量)是一种方法,旨在表示矢量表单中的文档,以便与机器学习过程进行比较。 该研究的结果比较了Jaro-Winkler距离方法与DOC2VEC方法之间的抄袭检测结果。 所用的最佳测量比较方法是文件比较和速度比较的准确性。 使用创建的数据集,DOC2VEC在比较文档相似度时表现出Jaro-Winkler距离算法。 因此,通过在印度尼西亚语科学作品中使用Doc2VEC(段落向量),将来将更容易地发展文档相似性方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号