首页> 外文期刊>PLoS One >Important citation identification by exploiting content and section-wise in-text citation count
【24h】

Important citation identification by exploiting content and section-wise in-text citation count

机译:利用内容和文本文本引文计数的重要引文识别

获取原文
           

摘要

A citation is deemed as a potential parameter to determine linkage between research articles. The parameter has extensively been employed to form multifarious academic aspects like calculating the impact factor of journals, h-Index of researchers, allocate different research grants, find the latest research trends, etc. The current state-of-the-art contends that all citations are not of equal importance. Based on this argument, the current trend in citation classification community categorizes citations into important and non-important reasons. The community has proposed different approaches to extract important citations such as citation count, context-based, metadata, and textual based approaches. The contemporary state-of-the-art in citation classification community ignores significantly potential features that can play a vital role in citation classification. This research presents a novel approach for binary citation classification by exploiting section-wise in-text citation frequencies, similarity score, and overall citation count-based features. The study also introduces machine learning algorithms based novel approach for assigning appropriate weights to the logical sections of research papers. The weights are allocated to the citations with respect to their sections. To perform the classification, we used three classification techniques, Support Vector Machine, Kernel Linear Regression, and Random Forest. The experiment was performed on two annotated benchmark datasets that contain 465 and 311 citation pairs of research articles respectively. The results revealed that the proposed approach attained an improved value of precision (i.e., 0.84 vs 0.72) from contemporary state-of-the-art approach.
机译:引用被认为是确定研究文章之间联动的潜在参数。该参数广泛用于形成多种学业方面,如计算期刊的影响因素,研究人员的H-Indep,分配不同的研究拨款,找到最新的研究趋势等。目前的最先进引文并不同等重要。基于此论点,引文分类界的当前趋势将引用分类为重要且非重要原因。社区提出了不同的方法来提取重要的引文,如引文计数,基于上下文,元数据和基于文本的方法。当代的引文群落中当代最先进的界面忽略了显着的潜在功能,可以在引文分类中发挥重要作用。本研究通过剥削文本文本引文,相似度得分和基于整体引用计数的特征来提出二进制引文分类的新方法。该研究还介绍了基于机器学习算法的新方法,用于将适当的重量分配给研究论文的逻辑部分。重量与他们的部分分配给引文。要执行分类,我们使用了三种分类技术,支持向量机,内核线性回归和随机林。在两个注释的基准数据集上进行实验,分别包含465和311引用研究文章。结果表明,拟议的方法从当代最先进的方法中获得了改善的精确度(即0.84 vs 0.72)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号