首页> 中文期刊> 《电子科技大学学报》 >基于PageRank的新闻关键词提取算法

基于PageRank的新闻关键词提取算法

         

摘要

Most of the existing methods of extracting keyword based on complex networks ignore the natural language characters when building the weighted text network. In the meantime, they involve less the classical algorithms in complex network field. Based on PageRank algorithm, we propose a keyword extraction method, named LTWPR (located and TF-weighted PageRank), which takes into consideration term-frequency character and human language characters. The algorithm creates a term-frequency-shared weight in order to share the node's term-frequency value to its links, and defines a position weight coefficient to express different importance of words in different positions of news articles. LTWPR brings text networks' local and global features into consideration, making the results more accurate. Comprehensive experiments are conducted based on news articles grabbed from Sina News. Experimental results show that LTWPR algorithm is more effective and can better cover the keywords tagged by authors.%现有的基于复杂网络的关键词提取算法在构建加权文本网络时没有考虑文本的自然语言特性,且在提取关键词时较少涉及复杂网络领域经典算法.本文引入词频分享权重,利用词频特性为节点之间的连边加权.在此基础上,基于PageRank算法,并结合人类语言习惯特性定义位置权重系数,提出了一个新的新闻关键词提取算法——LTWPR算法,综合考虑了文本网络的局部特征和全局特征.采用新浪新闻语料进行了大量实验,结果表明该算法能够快速有效的覆盖新闻作者标注的关键词,且提取效果更佳.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号