Chinese Document Keyword Extraction Algorithm Based on FP-growth

机译：基于FP增长的中文文档关键词提取算法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In view of the problems of the existing keyword extraction algorithm, such as large amount of computation and complex calculation process, this paper proposes an algorithm based on FP-Growth to extract keyword from Chinese documents. The FP-Growth algorithm mines word co-occurrence information, excluding the interference of noise words; semantic similarity computation using lexical chain eliminates the influence of synonyms; using TF-IDF and feature fusion method, considering frequency, part of speech and the position of the words, combine TF-IDF with "double comparing method" to calculate the weight of the characteristic factors, and build words weight function to calculate final weight of the candidate words. Experimental results show that the proposed method improves the accuracy rate and recall rate by about 10% compared to the traditional TF-IDF.

机译：针对现有关键词提取算法存在的计算量大，计算过程复杂等问题，提出了一种基于FP-Growth算法的中文文档关键词提取算法。 FP-Growth算法挖掘单词共现信息，排除噪声单词的干扰;使用词法链的语义相似度计算消除了同义词的影响;使用TF-IDF和特征融合方法，考虑频率，词性和单词位置，结合TF-IDF和“双重比较法”来计算特征因子的权重，并建立单词权重函数来计算最终权重个候选单词。实验结果表明，与传统的TF-IDF相比，该方法的准确率和召回率提高了约10％。

著录项

来源
《International Conference on Smart City and Systems Engineering》|2016年|202-205|共4页
会议地点
作者
Meng Zhao; Wanjun Yu; Wenjing Lu; Quan Liu; Jinxiao Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Speech; Itemsets; Semantics; Feature extraction; Vocabulary; Tagging;

机译：语音;项目集;语义;特征提取;词汇;标记;

相似文献

外文文献
中文文献
专利

1. Keyword Extraction Based on tf/idf for Chinese News Document [J] . LI Juanzi, FAN Qina, ZHANG Kuo Wuhan University Journal of Natural Sciences . 2007,第5期

机译：基于tf / idf的中文新闻文献关键词提取
2. Automatic keyword extraction from documents based on multiple content-based measures [J] . KunYue, Wei-Yi Liu, Li-Ping Zhou International Journal of Computer Systems Science & Engineering . 2011,第2期

机译：基于多种基于内容的措施自动从文档中提取关键字
3. A visual attention-based keyword extraction for document classification [J] . Wu Xing, Du Zhikang, Guo Yike Multimedia Tools and Applications . 2018,第19期

机译：基于视觉注意的关键词提取，用于文档分类
4. Chinese Document Keyword Extraction Algorithm Based on FP-growth [C] . Meng Zhao, Wanjun Yu, Wenjing Lu, International Conference on Smart City and Systems Engineering . 2016

机译：基于FP-Grang的中文文献关键字提取算法
5. Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing. [D] . Csomai, Andras. 2008

机译：薄雾中的关键字：自动提取非常大的文档并在书后建立索引的关键字。
6. Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification [O] . Jie Hu, Shaobo Li, Yong Yao, 2018

机译：基于专利分类的分布式表示的专利关键词提取算法
7. Algorithm of Keywords Extraction about Power Documents Based on Hadoop [O] . Tong Wang, Yongzhi Wang, Liang Jin, 2016

机译：基于Hadoop的电力文档的关键字提取算法

Chinese Document Keyword Extraction Algorithm Based on FP-growth

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅