An empirical study of important keyword extraction techniques from documents

机译：文献中重要关键词提取技术的实证研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Keyword extraction is an automated process that collects a set of terms, illustrating an overview of the document. The term is defined how the keyword identifies the core information of a particular document. Analyzing huge number of documents to find out the relevant information, keyword extraction will be the key approach. This approach will help us to understand the depth of it even before we read it. In this paper, we have given an overview of different approaches and algorithms that have been used in keyword extraction technique and compare them to find out the better approach to work in the future. We have studied various algorithms like support vector machine (SVM), conditional random fields (CRF), NP-chunk, n-grams, multiple linear regression, and logistic regression to find out important keywords in a document. We have figured out that SVM and CRF give better results where CRF accuracy is greater than SVM based on F1 score (The balance between precision and recall). According to precision, SVM shows a better result than CRF. But, in case of the recall, logit shows the greater result. Also, we have found out that, there are two more approaches that have been used in keyword extraction technique. One is statistical approach and another is machine learning approach. Statistical approaches show good result with statistical data. Machine learning approaches provide better result than the statistical approaches using training data. Some specimens of statistical approaches are Expectation-Maximization, K-Nearest Neighbor and Bayesian. Extractor and GenEx are the example of machine learning approaches in keyword extraction fields. Apart from these two approaches, semantic relation between words is another key feature in keyword extraction techniques.

机译：关键字提取是一个自动化的过程，收集一组术语，示出了文档的概述。该术语被定义如何关键字识别的特定文档的核心信息。分析文档数量巨大，找出相关的信息，关键字提取将是关键的方法。这种方法将帮助我们理解它的深度，即使我们读它。在本文中，我们给的已经在关键字提取技术被使用并加以比较，找出在未来更好的方法来工作，不同的方法和算法的概述。我们已经研究了各种算法，如支持向量机（SVM），条件随机域（CRF），NP-块，正克，多元线性回归，以及回归找出一个文件中重要的关键词。我们已经想通了，SVM和CRF提供更好的结果，其中CRF精度大于SVM基于F1分（精确度和召回之间的平衡）。据精度，SVM显示出比CRF更好的结果。但是，在召回的情况下，分对数显示了更大的结果。此外，我们还发现，有两个已经在关键字提取技术被用于多种方法。一种是统计方法，另一种是机器学习的方法。统计方法表明有统计数据的好成绩。机器学习方法提供比使用训练数据的统计方法更好的结果。的统计方法有些标本期望最大化，K近邻和贝叶斯。提取器和Genex是机器学习的例子在关键字提取的字段接近。除了这两种方法，单词之间的语义关系是关键词提取技术的另一个重要特征。

著录项

来源
《International Conference on Intelligent Systems and Information Management》|2017年|347p|共4页
会议地点
作者
H. M. Mahedi Hasan; Falguni Sanyal; Dipankar Chaki; Md. Haider Ali;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Feature extraction; Support vector machines; Semantics; Data mining; Training; Linear regression; Logistics;

机译：特征提取;支持向量机;语义;数据挖掘;培训;线性回归;物流;

相似文献

外文文献
中文文献
专利

1. An Empirical Study on Similarity between Documents in Nptel Application Using Clustering Techniques [J] . S.Appavu Alias Balamurugan, N.Kalpana International journal of computer science and network security . 2013,第12期

机译：基于聚类技术的Nptel应用程序中文档之间相似性的实证研究
2. An Overview of Techniques Used for Extracting Keywords from Documents [J] . Menaka S, Radha N International Journal of Computer Trends and Technology . 2013,第7期

机译：从文档中提取关键字的技术概述
3. A survey of keyword spotting techniques for printed document images [J] . Abirami Murugappan, Baskaran Ramachandran, P. Dhavachelvan Artificial Intelligence Review: An International Science and Engineering Journal . 2011,第2期

机译：用于打印文档图像的关键字识别技术的调查
4. An empirical study of important keyword extraction techniques from documents [C] . H. M. Mahedi Hasan, Falguni Sanyal, Dipankar Chaki, 2017 1st International Conference on Intelligent Systems and Information Management . 2017

机译：对文档中重要关键词提取技术的实证研究
5. Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing. [D] . Csomai, Andras. 2008

机译：薄雾中的关键字：自动提取非常大的文档并在书后建立索引的关键字。
6. Results of the Patient-Related Outcomes of Mechanical lead Extraction Techniques (PROMET) study: a multicentre retrospective study on advanced mechanical lead extraction techniques [O] . Christoph T Starck, Elkin Gonzalez, Omar Al-Razzo, -1

机译：机械铅提取技术（PROMET）患者相关结果的结果：先进的机械铅提取技术的多中心回顾性研究
7. An Empirical Study of the Application of Machine Learning and Keyword Terms Methodologies to Privilege-Document Review Projects in Legal Matters [O] . Peter Gronvall, Nathaniel Huber-Fliflet, Jianping Zhang, 2018

机译：机器学习和关键词方法应用对法律事务特权 - 文件审查项目的实证研究

An empirical study of important keyword extraction techniques from documents

摘要

著录项

相似文献

相关主题

期刊订阅