面向文本分类的中文文本语义表示方法

宋胜利; 王少龙; 陈平

首页> 中文期刊>西安电子科技大学学报（自然科学版） >面向文本分类的中文文本语义表示方法

面向文本分类的中文文本语义表示方法

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text representation based on word frequency statistics is often unsatisfactory because it ignores the semantic relationships between words, and considers them as independent features. In this paper, a new Chinese text semantic representation model is proposed by considering contextual semantic and background information on the words in the text. The method captures the semantic relationships between words using Wikipedia as a knowledge base. Words with strong semantic relationships are combined into a word-package as indicated by a graph node, which is weighted with the sum of the number and frequency of the words it contains. The contextual relationship between words in different word-packages is stated by a directed edge, which is weighted with the maximum weight of its adjacent nodes. The model retains the contextual information on each word with a large extent. Meanwhile, the semantic meaning between words is strengthened. Experimental results of Chinese text classification show that the proposed model can express the content of a text accurately and improve the performance of text classification. Compared to Support Vector Machines, Text Semantic Graph-based Classification can improve the efficiency by 7. 8%, reduce the error rate by 1/3, and show more stability.%为了解决词频统计文本表示方法中词语间语义信息缺失的问题,在考虑文本中词语上下文语境和语义背景信息的基础上,提出了一种新的中文文本表示模型——文本语义图.该方法利用维基百科作为知识背景计算文本中实意特征词语的语义关联,将具有较强语义关系的词语合并成词包作为图的节点,节点权值用词包所包含词语的数目及词频计算；不同词包中词语间的上下文关系作为图的有向边,有向边权值用其邻接节点的最大权值表示.该模型在较大程度地保留文本中词语上下文信息的同时强化了词语间语义内涵.通过中文文本分类实验,文本语义图分类方法相对于支持向量机分类效率提升了7.8％,同时错误率减少了1/3,且表现出更好的稳定性.实验结果表明在文本分类应用中,文本语义图模型能够有效地表示文本内容.

著录项

来源
《西安电子科技大学学报（自然科学版）》|2013年第2期|89-97129|共10页
作者
宋胜利; 王少龙; 陈平;
展开▼
作者单位

西安电子科技大学软件工程研究所,陕西西安 710071;

展开▼
原文格式 PDF
正文语种 chi
中图分类信息处理（信息加工）;
关键词
分类; 知识表示; 相似度; 文本语义图;
入库时间 2023-07-25 18:05:38

相似文献

中文文献
外文文献
专利

1. 中文文本分类中一种基于语义的特征降维方法 [J] . 胡涛 ,刘怀亮 . 现代情报 . 2011,第011期
2. 面向文本分类的有监督显式语义表示 [J] . 孙飞 ,郭嘉丰 ,兰艳艳 . 数据采集与处理 . 2017,第003期
3. 面向中文文本分类的词级对抗样本生成方法 [J] . 仝鑫 ,王罗娜 ,王润正 . 信息网络安全 . 2020,第009期
4. 基于语义和统计特征的中文文本表示方法 [J] . 曾德华 . 中国管理信息化 . 2009,第015期
5. 一种基于语义和统计特征的中文文本特征表示方法 [J] . 赵鹏 ,耿焕同 ,蔡庆生 . 小型微型计算机系统 . 2007,第007期
6. 中文文本分类中一种基于语义的特征降维方法 [C] . 胡涛 ,刘怀亮 . 《图书情报工作》杂志社、图书情报工作研究会第25次图书馆学情报学学术研讨会 . 2011
7. 中文文本分类中文本表示及分类算法研究 [A] . 蒋红 . 2007

面向文本分类的中文文本语义表示方法

摘要

著录项

相似文献

相关主题

期刊订阅