基于语义关联和信息增益的TFIDF改进算法研究

许珂; 蒙祖强; 林啓峰

首页> 中文期刊> 《计算机应用研究》 >基于语义关联和信息增益的TFIDF改进算法研究

基于语义关联和信息增益的TFIDF改进算法研究

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

基于词频反文档频率(term frequency inverse document frequency,TFIDF)的现有文本特征提取算法及其改进算法未能考虑类别内部词语之间的语义关联,如果脱离语义,提取出的特征不能很好地刻画文档的内容.为准确提取特征,在信息熵与信息增益的基础上,加入词语的语义关联因素,实现融合语义信息的特征提取,进而提出语义和信息增益相结合的TFIDF改进算法,该算法弥补了统计方法丢失语义信息的弊端.实验结果表明,该算法有效地提高了文本分类的精准率.%Both the traditional and improved term frequency-inverse document frequency ( TFIDF) algorithms ignored the difference of distributions among different categories in feature extraction. Due to the lacking of consideration of semantic rela- tionships within some certain categories, the selected feature word cannot describe the contents of fhe document correctly and . Accurately. In order to select feature more accurately, in this paper, based on the previous improvements, introduced the se- mantic association of words to analyze the semantic of text, redesigned the weights equation, and proposed the new TFIDF algorithm combined with semantic and information gain. The developed algorithm can make up for the shortcomings of the lack of semantic information in statistical method. Experimental results illustrate1 that the improved algorithm can effectively improve text classification accuracy.

著录项

来源
《计算机应用研究》 |2012年第2期|557-560|共4页
作者
许珂; 蒙祖强; 林啓峰;
展开▼
作者单位

广西大学计算机与电子信息学院;

南宁530004;

广西大学计算机与电子信息学院;

南宁530004;

广西大学计算机与电子信息学院;

南宁530004;

展开▼
原文格式 PDF
正文语种 chi
中图分类理论、方法;
关键词
词频反文档频率; 特征提取; 语义关联; 信息增益; 文本分类;

相似文献

中文文献
外文文献
专利

1. 基于信息增益与信息熵的TFIDF算法 [J] . 李学明 ,李海瑞 ,薛亮 . 计算机工程 . 2012,第008期
2. 基于改进信息增益的ACO-WNB分类算法研究 [J] . 邱宁佳 ,高鹏 ,王鹏 . 计算机仿真 . 2019,第001期
3. 基于MapReduce编程模型的TFIDF算法研究 [J] . 赵伟燕 ,王静宇 . 微型机与应用 . 2013,第004期
4. 基于信息熵的TFIDF文本分类特征选择算法研究 [J] . 陈国松 ,黄大荣 . 湖北民族学院学报（自然科学版） . 2008,第004期
5. 基于语义关联分析的商务情报分析算法研究* [J] . 何超 ,张玉峰 . 情报杂志 . 2013,第004期
6. 基于改进TFIDF的文本特征选择算法 [C] . 杨成成 ,贺兴时 . 2008年全国模式识别学术会议 . 2008
7. 基于改进的TFIDF关键词自动提取算法研究 [A] . 杨凯艳 . 2015

基于语义关联和信息增益的TFIDF改进算法研究

摘要

著录项

相似文献

相关主题

期刊订阅