基于相关性及语义的n-grams特征加权算法

邱云飞; 刘世兴; 林明明; 邵良杉

首页> 中文期刊> 《模式识别与人工智能》 >基于相关性及语义的n-grams特征加权算法

基于相关性及语义的n-grams特征加权算法

AI论文写作 >>

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

n-grams作为文本分类特征时易造成分类准确率下降,并且在对n-grams加权时通常忽略单词间的冗余度和相关性.针对上述问题,文中提出基于相关性及语义的n-grams特征加权算法.在文本预处理时,对n-grams进行特征约简,降低内部冗余,再根据n-grams内单词与类别的相关性及n-grams与测试集的语义近似度加权.搜狗中文新闻语料库和网易文本分类语料库上的实验表明,文中算法能筛选高类别相关且低冗余的n-grams特征,在量化测试集时减少稀疏数据的产生.%When n-grams are considered as text classification features, the classification accuracy is decreased. The redundancy and relevance between words are ignored while n-grams are weighted. Thus, n-grams features weighting algorithm based on relevance and semantic is proposed. To decrease the internal redundancy, feature reduction is conducted to n-grams during text preprocessing. Then, n-grams are weighted according to the relevance of words and classes in n-grams and the semantic similarity of n-grams and testing dataset. The experimental results on Sougo Chinese news corpse and NetEase text corpse show that the proposed algorithm can select n-grams features of high relevance and low redundancy, and reduce the sparse data while quantifying the testing dataset.

著录项

来源
《模式识别与人工智能》 |2015年第11期|992-1001|共10页
作者
邱云飞; 刘世兴; 林明明; 邵良杉;
展开▼
作者单位

辽宁工程技术大学软件学院葫芦岛125105;

辽宁工程技术大学软件学院葫芦岛125105;

辽宁工程技术大学软件学院葫芦岛125105;

辽宁工程技术大学系统工程研究所葫芦岛125105;

展开▼
原文格式 PDF
正文语种 chi
中图分类文字信息处理;
关键词
最大相关度最小冗余度(mRMR); 语义相似度; n-grams; 特征加权;

相似文献

中文文献
外文文献
专利

1. 基于语义的文本特征加权分类算法 [J] . 张国栋 ,张化祥 . 计算机应用研究 . 2012,第012期
2. 基于概念语义相关性和LDA的文本标记算法 [J] . 周春 ,蒋运承 . 华南师范大学学报（自然科学版） . 2018,第004期
3. 基于语义相关性与拓扑关系的跨媒体检索算法 [J] . 代刚 ,张鸿 . 计算机应用 . 2018,第009期
4. 一种基于节点语义相关性的XML关键字查询算法 [J] . 曾晓宁 ,蔺旭东 ,李密生 . 电脑知识与技术 . 2009,第011期
5. 一个基于语义相关性知识发现的模型与算法 [J] . 宋贤钧 . 沈阳农业大学学报 . 2005,第005期
6. 基于Relief-F特征加权支持向量机的语义图像分类 [C] . 刘杰 ,杜军平 . 2011年中国智能自动化会议 . 2011
7. 基于n-grams的特征约简及语义加权算法研究 [A] . 刘世兴 . 2015

基于相关性及语义的n-grams特征加权算法

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅