首页> 外文会议>International forum on computer science-technology and applications;IFCSTA 2009 >Research of Chinese Text Classification Methods Based on Semantic Vector and Semantic Similarity
【24h】

Research of Chinese Text Classification Methods Based on Semantic Vector and Semantic Similarity

机译:基于语义向量和语义相似度的中文文本分类方法研究

获取原文

摘要

To overcome the limitations of traditional text classification approaches based on bag-of-words representation and to effectively incorporate linguistic knowledge and conceptual index into text vector space model, based on two thesaurus HowNet and Tongyici Cilin(hereinafter referred to Cilin), we use semantic vector to describe a document instead of traditional keywords vector, which is based on merging words with high similarity and using a concept to describe the semantic feature rather than a series of words. It not only reduces feature dimension but also adds semantic information to the vector. We also use sentence (document) similarity based on simple vector distance to classify the text and three groups of experiments are made respectively. The results show that the accuracy rates are generally improved along with semantic treatment, which indicates that semantic mining is very important and necessary to text classification.
机译:为了克服传统的基于词袋表示的文本分类方法的局限性,并将语言知识和概念索引有效地结合到文本向量空间模型中,基于两个词库HowNet和Tongyici Cilin(以下简称Cilin),我们使用语义向量来描述文档,而不是传统的关键字向量,它基于合并具有高度相似性的单词并使用一种概念来描述语义特征,而不是一系列单词。它不仅减小了特征维,而且还向矢量添加了语义信息。我们还使用基于简单矢量距离的句子(文档)相似度对文本进行分类,并分别进行了三组实验。结果表明,随着语义处理,准确率总体上得到提高,这表明语义挖掘对于文本分类非常重要和必要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号