首页> 外文OA文献 >Document categorization based on minimum loss of reconstruction information
【2h】

Document categorization based on minimum loss of reconstruction information

机译:基于最小重建信息损失的文档分类

摘要

In this paper we present and validate a novel approach for single-label multi-class document categorization. The proposed categorization approach relies on the statistical property of Principal Component Analysis (PCA), which minimizes the reconstruction error of the training documents used to compute a low-rank category transformation matrix. This matrix allows projecting the original training documents from a given category to a new low-rank space and then optimally reconstructs them to the original space with a minimum loss of information.The proposed method, called Minimum Loss of Reconstruction Information (mLRI) classifier, uses this property, extends and applies it to unseen documents. Several experiments on three well-known multiclass datasets for text categorization are conducted in order to highlight the stable and generally better performance of the proposed approach in comparison with other popular categorization methods.
机译:在本文中,我们提出并验证了一种用于单标签多类文档分类的新颖方法。提出的分类方法依赖于主成分分析(PCA)的统计属性,该属性将用于计算低等级类别转换矩阵的训练文档的重构误差最小化。该矩阵允许将原始培训文档从给定类别投影到新的低阶空间,然后以最小的信息损失将它们最佳地重构到原始空间。使用此属性,扩展并将其应用于看不见的文档。进行了三个用于文本分类的著名多类数据集的实验,以突出与其他流行的分类方法相比,该方法的稳定性和总体上更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号