首页> 中文期刊> 《智能系统学报》 >基于遗传算法优化综合启发式的中文网页特征提取

基于遗传算法优化综合启发式的中文网页特征提取

         

摘要

特征提取是信息检索、文本分类、文本聚类以及自动文摘生成等技术的基础。针对传统的特征提取方法不能全面有效地考查待选特征词的缺点,提出了一种基于遗传算法优化综合启发式的中文网页特征提取方法。该方法通过词频、关联度、词性以及位置等多种启发式来综合考查待选特征,并利用遗传算法来优化各启发式的权重参数。通过在不同测试集上进行对比,实验结果表明,与传统方法相比,该方法能够有效避免传统特征提取方法产生的偏差,获得具有代表性的特征集,从而使得该方法具有一定的实用价值。%Feature extraction is the basis of such technologies as information retrieval , text classification , text clus-tering and automatic summarization .Aiming at the shortcomings of the traditional feature extraction methods which make it difficult to test feature words comprehensively and effectively , this paper proposes a method for extracting Chinese web page features by optimizing the comprehensive heuristic features based on GA .This proposed method employs comprehensive heuristics of word frequency , word correlation, parts of speech (POS) and position features to comprehensively test selected features and uses GA to optimize the weight of each heuristic parameter .The exper-imental results of the different test sets show that the proposed method can effectively avoid the derivations of the traditional extraction methods and obtain more representative features , and therefore it has a certain practical value .

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号