首页> 外文会议>International Conference on Intelligent Systems and Information Management >An empirical study of important keyword extraction techniques from documents
【24h】

An empirical study of important keyword extraction techniques from documents

机译:文献中重要关键词提取技术的实证研究

获取原文

摘要

Keyword extraction is an automated process that collects a set of terms, illustrating an overview of the document. The term is defined how the keyword identifies the core information of a particular document. Analyzing huge number of documents to find out the relevant information, keyword extraction will be the key approach. This approach will help us to understand the depth of it even before we read it. In this paper, we have given an overview of different approaches and algorithms that have been used in keyword extraction technique and compare them to find out the better approach to work in the future. We have studied various algorithms like support vector machine (SVM), conditional random fields (CRF), NP-chunk, n-grams, multiple linear regression, and logistic regression to find out important keywords in a document. We have figured out that SVM and CRF give better results where CRF accuracy is greater than SVM based on F1 score (The balance between precision and recall). According to precision, SVM shows a better result than CRF. But, in case of the recall, logit shows the greater result. Also, we have found out that, there are two more approaches that have been used in keyword extraction technique. One is statistical approach and another is machine learning approach. Statistical approaches show good result with statistical data. Machine learning approaches provide better result than the statistical approaches using training data. Some specimens of statistical approaches are Expectation-Maximization, K-Nearest Neighbor and Bayesian. Extractor and GenEx are the example of machine learning approaches in keyword extraction fields. Apart from these two approaches, semantic relation between words is another key feature in keyword extraction techniques.
机译:关键字提取是一个自动化的过程,收集一组术语,示出了文档的概述。该术语被定义如何关键字识别的特定文档的核心信息。分析文档数量巨大,找出相关的信息,关键字提取将是关键的方法。这种方法将帮助我们理解它的深度,即使我们读它。在本文中,我们给的已经在关键字提取技术被使用并加以比较,找出在未来更好的方法来工作,不同的方法和算法的概述。我们已经研究了各种算法,如支持向量机(SVM),条件随机域(CRF),NP-块,正克,多元线性回归,以及回归找出一个文件中重要的关键词。我们已经想通了,SVM和CRF提供更好的结果,其中CRF精度大于SVM基于F1分(精确度和召回之间的平衡)。据精度,SVM显示出比CRF更好的结果。但是,在召回的情况下,分对数显示了更大的结果。此外,我们还发现,有两个已经在关键字提取技术被用于多种方法。一种是统计方法,另一种是机器学习的方法。统计方法表明有统计数据的好成绩。机器学习方法提供比使用训练数据的统计方法更好的结果。的统计方法有些标本期望最大化,K近邻和贝叶斯。提取器和Genex是机器学习的例子在关键字提取的字段接近。除了这两种方法,单词之间的语义关系是关键词提取技术的另一个重要特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号