【24h】

Comparison of document similarity algorithms in extracting document keywords from an academic paper

机译:学术论文关键词提取中文档相似度算法的比较

获取原文

摘要

The idea of this study is to validate a list of keywords derived from a scientific article by a domain expert from years of knowledge with prominent document similarity algorithms. For this study, a list of handcrafted keywords generated by Electric Double Layer Capacitor (EDLC) experts are chosen, and relevant documents to EDLC are considered for the comparison. Then, different similarity calculation algorithms were employed in different settings on the documents such as using the whole texts of the documents, selecting the positive sentences of the documents, and generating similarity score with automatically extracted keywords from the documents. The experiment’s outcome provides us with findings that the machine-generated keywords are mostly similar to the curated list by the domain experts. This study also suggests the preferable algorithms for similarity calculation and automated key-phrase extraction for the EDLC domain.
机译:这项研究的目的是验证一个领域专家从一篇科学文章中提取的关键词列表,这些关键词来自多年来使用显著的文档相似性算法的知识。在本研究中,选择了由双电层电容器(EDLC)专家手工制作的关键词列表,并考虑了EDLC的相关文件进行比较。然后,在不同的文档设置下采用不同的相似度计算算法,例如使用文档的整个文本,选择文档的肯定句,以及使用自动从文档中提取的关键字生成相似度分数。实验结果为我们提供了一个发现,机器生成的关键字与领域专家策划的列表基本相似。本研究还为EDLC领域的相似性计算和自动关键短语提取提出了更好的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号