【24h】

Keyword Extraction Performance Analysis

机译:关键字提取效果分析

获取原文

摘要

This paper presents a survey-cum-evaluation of methods for the comprehensive comparison of the task of keyword extraction using datasets of various sizes, forms, and genre. We use four different datasets which includes Amazon product data - Automotive, SemEval 2010, TMDB and Stack Exchange. Moreover, a subset of 100 Amazon product reviews is annotated and utilized for evaluation in this paper, to our knowledge, for the first time. Datasets are evaluated by five Natural Language Processing approaches (3 unsupervised and 2 supervised), which include TF-IDF, RAKE, TextRank, LDA and Shallow Neural Network. We use a ten-fold cross-validation scheme and evaluate the performance of the aforementioned approaches using recall, precision and F-score. Our analysis and results provide guidelines on the proper approaches to use for different types of datasets. Furthermore, our results indicate that certain approaches achieve improved performance with certain datasets due to inherent characteristics of the data.
机译:本文介绍了使用各种大小,形式和体裁的数据集对关键字提取任务进行全面比较的方法的调查和评估。我们使用四个不同的数据集,其中包括Amazon产品数据-汽车,SemEval 2010,TMDB和Stack Exchange。此外,据我们所知,本文首次注释了100条Amazon产品评论的一部分,并将其用于评估。通过五种自然语言处理方法(3种无监督和2种有监督)对数据集进行评估,其中包括TF-IDF,RAKE,TextRank,LDA和Shallow Neural Network。我们使用十倍交叉验证方案,并使用召回率,精度和F分数评估上述方法的性能。我们的分析和结果为使用不同类型的数据集的正确方法提供了指导。此外,我们的结果表明,由于数据的固有特性,某些方法可以提高某些数据集的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号