首页> 外文期刊>International journal of computers, communications and control >An Application of Latent Semantic Analysis for Text Categorization
【24h】

An Application of Latent Semantic Analysis for Text Categorization

机译:潜在语义分析在文本分类中的应用

获取原文
       

摘要

It is a challenge task to discover major topics from text, which provide a better understanding of the whole corpus and can be regarded as a text categorization problem. The goal of this paper is to apply latent semantic analysis (LSA) approach to extract common factors that representing concepts hidden in a large group of text. LSA involves three steps: the first step is to set up a term-document matrix; the second step is to transform the term frequencies into a term-document matrix using various weighting schemes; the third step performs singular value decomposition (SVD) on the matrix to reduce the dimensionality. The reduced-order SVD is the best k-dimensional approximation to the original matrix. The experiment uses more than fifteen hundreds research paper abstracts from a specific field. Because different factor solutions of the LSA suggest different levels of aggregation, this work examines thirteen solutions in the experiment. The results show that LSA is able to identify not only principle categories, but also major themes contained in the text.
机译:从文本中发现主要主题是一项艰巨的任务,可以更好地理解整个语料库,并且可以将其视为文本分类问题。本文的目的是应用潜在语义分析(LSA)方法来提取代表隐藏在大量文本中的概念的共同因素。 LSA涉及三个步骤:第一步是建立术语文档矩阵;第二步是使用各种加权方案将术语频率转换为术语文档矩阵。第三步,对矩阵执行奇异值分解(SVD)以降低维数。降阶SVD是原始矩阵的最佳k维近似。该实验使用了来自特定领域的超过1500种研究论文摘要。由于LSA的不同因素解决方案建议的聚集程度不同,因此本工作研究了13种解决方案。结果表明,LSA不仅可以识别原则类别,还可以识别文本中包含的主要主题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号