首页> 外文期刊>Knowledge-Based Systems >Latent Semantic Analysis For Text Categorization Using Neural Network
【24h】

Latent Semantic Analysis For Text Categorization Using Neural Network

机译:基于神经网络的文本分类潜在语义分析

获取原文
获取原文并翻译 | 示例

摘要

New text categorization models using back-propagation neural network (BPNN) and modified back-propagation neural network (MBPNN) are proposed. An efficient feature selection method is used to reduce the dimensionality as well as improve the performance. The basic BPNN learning algorithm has the drawback of slow training speed, so we modify the basic BPNN learning algorithm to accelerate the training speed. The categorization accuracy also has been improved consequently. Traditional word-matching based text categorization system uses vector space model (VSM) to represent the document. However, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which can also lead to poor classification accuracy. Latent semantic analysis (LSA) can overcome the problems caused by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. It not only greatly reduces the dimensionality but also discovers the important associative relationship between terms. We test our categorization models on 20-newsgroup data set, experimental results show that the models using MBPNN outperform than the basic BPNN. And the application of LSA for our system can lead to dramatic dimensionality reduction while achieving good classification results.
机译:提出了使用反向传播神经网络(BPNN)和改进的反向传播神经网络(MBPNN)的新文本分类模型。一种有效的特征选择方法用于减少维数并提高性能。基本的BPNN学习算法具有训练速度慢的缺点,因此我们修改了基本的BPNN学习算法以加快训练速度。因此,分类精度也得到了提高。传统的基于单词匹配的文本分类系统使用向量空间模型(VSM)表示文档。但是,它需要一个高维空间来表示文档,并且没有考虑术语之间的语义关系,这也可能导致较差的分类准确性。潜在语义分析(LSA)可以克服由于使用统计派生的概念索引而不是单个单词而引起的问题。它构建了一个概念向量空间,其中每个术语或文档在空间中均表示为向量。它不仅大大降低了维数,而且发现了词之间重要的关联关系。我们在20个新闻组数据集上测试了分类模型,实验结果表明,使用MBPNN的模型优于基本的BPNN。 LSA在我们的系统中的应用可以导致大幅降低尺寸,同时获得良好的分类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号