首页> 外文会议>International Conference on Smart City and Systems Engineering >Chinese Document Keyword Extraction Algorithm Based on FP-growth
【24h】

Chinese Document Keyword Extraction Algorithm Based on FP-growth

机译:基于FP增长的中文文档关键词提取算法

获取原文
获取外文期刊封面目录资料

摘要

In view of the problems of the existing keyword extraction algorithm, such as large amount of computation and complex calculation process, this paper proposes an algorithm based on FP-Growth to extract keyword from Chinese documents. The FP-Growth algorithm mines word co-occurrence information, excluding the interference of noise words; semantic similarity computation using lexical chain eliminates the influence of synonyms; using TF-IDF and feature fusion method, considering frequency, part of speech and the position of the words, combine TF-IDF with "double comparing method" to calculate the weight of the characteristic factors, and build words weight function to calculate final weight of the candidate words. Experimental results show that the proposed method improves the accuracy rate and recall rate by about 10% compared to the traditional TF-IDF.
机译:针对现有关键词提取算法存在的计算量大,计算过程复杂等问题,提出了一种基于FP-Growth算法的中文文档关键词提取算法。 FP-Growth算法挖掘单词共现信息,排除噪声单词的干扰;使用词法链的语义相似度计算消除了同义词的影响;使用TF-IDF和特征融合方法,考虑频率,词性和单词位置,结合TF-IDF和“双重比较法”来计算特征因子的权重,并建立单词权重函数来计算最终权重个候选单词。实验结果表明,与传统的TF-IDF相比,该方法的准确率和召回率提高了约10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号