首页> 中文期刊>青岛大学学报(自然科学版) >基于TextRank的网评产品特征提取方法

基于TextRank的网评产品特征提取方法

     

摘要

In order to solve the problem of low extraction accuracy caused by ignoring the connection between words in the classical TF-IDF algorithm,a TextRank word construction method based on word2vec weighting is proposed.First of all through the network crawler to obtain the product review corpus,and word segmentation,POS tagging and noun extraction pretreatment;secondly using word2vec words and word form similarity matrix between element;finally the similarity term word2vec to obtain the influence between the words as the weight of the improved extraction method of the classic TextRank product features.The experimental data show that the precition ratio of traditional TextRank product feature extraction method is improved by 5 %,and the recall ratio is improved by 2.9 % by using this improved method.%针对经典TF-IDF算法在在文档特征词提取中因忽略了词之间连接关系而导致提取准确率较低的问题,提出一种基于word2vec加权的TextRank词图构建方法.首先通过爬虫获取网络产品评论语料,并进行分词、词性标注以及名词提取等预处理;其次利用word2vec形成词元与词元之间的相似度矩阵;最后将word2vec中获取到的词元之间的相似度作为词语影响力权值,对经典TextRank产品特征提取方法进行改进.实验数据表明,与传统的TextRank产品特征提取方法相比,改进后的方法查准率提高了5%,查全率提高了2.9%,在实际工程中能够有效的提高产品特征提取的准确率.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号