首页> 外文会议>Insternational Joint Conference on Natural Language Processing >A Study of Semi-Discrete Matrix Decomposition for LSI in Automated Text Categorization
【24h】

A Study of Semi-Discrete Matrix Decomposition for LSI in Automated Text Categorization

机译:自动文本分类中LSI半离散矩阵分解研究

获取原文
获取外文期刊封面目录资料

摘要

This paper proposes the use of Latent Semantic Indexing (LSI) techniques, decomposed with semi-discrete matrix decomposition (SDD) method, for text categorization. The SDD algorithm is a recent solution to LSI, which can achieve similar performance at a much lower storage cost. In this paper, LSI is used for text categorization by constructing new features of category as combinations or transformations of the original features. In the experiments on data set of Chinese Library Classification we compare accuracy to a classifier based on k-Nearest Neighbor (k-NN) and the result shows that k-NN based on LSI is sometimes significantly better. Much future work remains, but the results indicate that LSI is a promising technique for text categorization.
机译:本文提出了使用潜在语义索引(LSI)技术,用半离散矩阵分解(SDD)方法分解,用于文本分类。 SDD算法是LSI最近的解决方案,可以以更低的存储成本实现类似的性能。 在本文中,LSI通过构建类别的新功能作为原始功能的组合或转换来用于文本分类。 在中文库分类数据集的实验中,我们将准确性与基于k最近邻(k-nn)的分类器进行比较,结果表明,基于LSI的K-Nn有时明显更好。 未来的工作仍然存在,但结果表明,LSI是文本分类的有希望的技术。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号