首页> 中文期刊> 《计算机工程与应用》 >基于组合特征的中文新闻网页关键词提取方法

基于组合特征的中文新闻网页关键词提取方法

         

摘要

Considering the characteristics of Chinese news Web pages, this paper uses many features including statistical feature, position feature and POS(Part of Speech)feature to evaluate the weight of candidate keywords. In order to solve the problem of that some segmentation cannot reflect the theme, this paper proposes a compound words generation method based on directed graph, which aims to find adjacency words for compound words. The experimental results show that this method is vastly superior to the conventional TF-IDF method in efficiency and can extract keyword from news Web page efficiently.%针对中文新闻网页的特点,使用了包括统计特征、位置特征和词性特征等在内的多种特征综合评定候选关键词的权重大小。对于部分分词结果不能良好地反映主题的问题,提出了一种基于有向图的组合词生成方法,旨在找出高频次的相邻词作为组合词。实验结果表明,该方法较传统的TF-IDF方法效率有较大提升,能够有效提取出新闻网页关键词。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号