首页> 外文会议>2017 ACM/IEEE Joint Conference on Digital Libraries >Identifying Important Citations Using Contextual Information from Full Text
【24h】

Identifying Important Citations Using Contextual Information from Full Text

机译:使用全文中的上下文信息识别重要的引文

获取原文
获取原文并翻译 | 示例

摘要

In this paper we address the problem of classifying cited work into important and non-important to the developments presented in a research publication. This task is vital for the algorithmic techniques that detect and follow emerging research topics and to qualitatively measure the impact of publications in increasingly growing scholarly big data. We consider cited work as important to a publication if that work is used or extended in some way. If a reference is cited as background work or for the purpose of comparing results, the cited work is considered to be non-important. By employing five classification techniques (Support Vector Machine, Naïve Bayes, Decision Tree, K-Nearest Neighbors and Random Forest) on an annotated dataset of 465 citations, we explore the effectiveness of eight previously published features and six novel features (including context based, cue words based and textual based). Within this set, our new features are among the best performing. Using the Random Forest classifier we achieve an overall classification accuracy of 0.91 AUC.
机译:在本文中,我们解决了将引用的工作分类为对研究出版物中提出的发展重要且不重要的问题。这项任务对于检测和跟踪新兴研究主题并定性评估出版物在日益增长的学术大数据中的影响的算法技术至关重要。如果以某种方式使用或扩展引用的作品,我们认为该作品对出版物很重要。如果引用参考文献作为背景工作或出于比较结果的目的,则认为所引用的工作不重要。通过在465条引用的带注释的数据集上采用五种分类技术(支持向量机,朴素贝叶斯,决策树,K最近邻和随机森林),我们探索了八种先前发布的功能和六种新颖功能(包括基于上下文,基于提示词和基于文本)。在这个组合中,我们的新功能是性能最好的。使用随机森林分类器,我们可以实现0.91 AUC的总体分类精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号