【24h】

Papers' similarity based on the summarization merits

机译:基于总结优点的论文相似度

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a Research paper Similarity system that measures the similarity of an input paper with other papers based on the summarized version of each paper. Currently, This system will take into account 2 different types of summarization for papers based on the different types of keywords,i.e, Normal keywords and Stemmed keywords. On the contrast to the current and existing recommendation systems for research papers that are using citation and/or Page Rank data, our system works dependent from them but dependent to the textual content of the paper. Our experiment, which was conducting regarding to one of the citation-based papers' recommendation systems, Google Scholar, as a baseline, shows that citation-based systems like Google scholar are very vulnerable to ignore more related but less cited papers while systems based on the textual value of papers can be more successful to recommend papers that are more similar to the input paper. However, comparing full-textual content of papers is a time consuming and aggressive process, while achieving a summarized version of papers and comparing them, can be both faster and reusable. In addition, we show that the ranked listing that Google scholar returns, can be formulated and predicted based on the citation scores. Furthermore, we show that how statistically, Normal keyword summarization can be a better choice between the two types of summarization of papers. As a future work, we will build a synonym-acronym dictionary for scholarly papers in computer science and engineering field, to add the synonym-acronym comparison to the system.
机译:本文提出了一种研究论文相似性系统,该系统基于每篇论文的摘要版本来衡量输入论文与其他论文的相似性。目前,该系统将根据关键字的不同类型(即普通关键字和词干关键字)考虑两种不同的论文摘要类型。与使用引文和/或页面等级数据的研究论文的当前和现有推荐系统形成对比的是,我们的系统依赖于它们,但取决于论文的文本内容。我们的实验是针对一种基于引文的论文推荐系统Google Scholar作为基准,该实验表明,基于引文的系统(如Google Scholar)很容易忽略相关性更高但引用较少的论文,而基于论文的文本价值可以更成功地推荐与输入论文更相似的论文。但是,比较论文的全文内容是一个耗时且费力的过程,尽管获得论文的摘要版本并进行比较,既可以更快又可以重用。此外,我们表明,可以根据引文得分来制定和预测Google学者返回的排名列表。此外,我们表明,从统计学上来说,Normal关键字摘要在两种论文摘要类型之间可以是更好的选择。作为未来的工作,我们将为计算机科学和工程领域的学术论文构建同义词-缩略语词典,以将同义词-缩略语比较添加到系统中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号