首页> 中文期刊>西安电子科技大学学报(自然科学版) >面向文本分类的中文文本语义表示方法

面向文本分类的中文文本语义表示方法

     

摘要

Text representation based on word frequency statistics is often unsatisfactory because it ignores the semantic relationships between words, and considers them as independent features. In this paper, a new Chinese text semantic representation model is proposed by considering contextual semantic and background information on the words in the text. The method captures the semantic relationships between words using Wikipedia as a knowledge base. Words with strong semantic relationships are combined into a word-package as indicated by a graph node, which is weighted with the sum of the number and frequency of the words it contains. The contextual relationship between words in different word-packages is stated by a directed edge, which is weighted with the maximum weight of its adjacent nodes. The model retains the contextual information on each word with a large extent. Meanwhile, the semantic meaning between words is strengthened. Experimental results of Chinese text classification show that the proposed model can express the content of a text accurately and improve the performance of text classification. Compared to Support Vector Machines, Text Semantic Graph-based Classification can improve the efficiency by 7. 8%, reduce the error rate by 1/3, and show more stability.%为了解决词频统计文本表示方法中词语间语义信息缺失的问题,在考虑文本中词语上下文语境和语义背景信息的基础上,提出了一种新的中文文本表示模型——文本语义图.该方法利用维基百科作为知识背景计算文本中实意特征词语的语义关联,将具有较强语义关系的词语合并成词包作为图的节点,节点权值用词包所包含词语的数目及词频计算;不同词包中词语间的上下文关系作为图的有向边,有向边权值用其邻接节点的最大权值表示.该模型在较大程度地保留文本中词语上下文信息的同时强化了词语间语义内涵.通过中文文本分类实验,文本语义图分类方法相对于支持向量机分类效率提升了7.8%,同时错误率减少了1/3,且表现出更好的稳定性.实验结果表明在文本分类应用中,文本语义图模型能够有效地表示文本内容.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号