首页> 中文期刊> 《计算机工程与设计》 >基于隐性语义索引的多标签文本分类集成方法

基于隐性语义索引的多标签文本分类集成方法

         

摘要

针对多标签文本分类的概念歧义和底层语意结构问题,提出一种集成分类方法,将随机森林(RF)算法和隐性语义索引(LSI)有机结合在一起.通过词汇的随机分割增加集成的多样性,获得低维隐性语义空间的不同正交投影,在低维空间的正交投影基础上执行LSI.随机森林可以有效解决二进制分类问题,隐性语义揭示了文本的底层语义结构,两者结合可代表群体的多样性和个体准确性.Yahoo数据集上的实验结果验证了该方法的有效性,其在汉明损失、覆盖度、首位误差和平均精度方面优于其它方法.%Aiming at the concept of ambiguity and the underlying semantic structure for multiple label text classification,an integration classification method was presented,in which random forest (RF) algorithm and the latent semantic index (LSI) were combined.The diversity of integration was increased by the random segmentation of words,and the orthogonal projection of the low dimensional latent semantic space was obtained.Based on the orthogonal projection of the low dimensional space,LSI was implemented.Random forests can effectively solve the problem of binary classification,which reveals the underlying semantic structure of texts.And the combination of the two can represent the diversity of the population and individual accuracy.The effectiveness of the proposed method is verified by the experimental results on Yahoo data sets.It is better than several other methods in Hamming loss,coverage,first error and average accuracy.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号