An Application of Latent Semantic Analysis for Text Categorization

Gang Kou; Yi Peng

首页> 外文期刊>International journal of computers, communications and control >An Application of Latent Semantic Analysis for Text Categorization

【24h】

An Application of Latent Semantic Analysis for Text Categorization

机译：潜在语义分析在文本分类中的应用

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

It is a challenge task to discover major topics from text, which provide a better understanding of the whole corpus and can be regarded as a text categorization problem. The goal of this paper is to apply latent semantic analysis (LSA) approach to extract common factors that representing concepts hidden in a large group of text. LSA involves three steps: the first step is to set up a term-document matrix; the second step is to transform the term frequencies into a term-document matrix using various weighting schemes; the third step performs singular value decomposition (SVD) on the matrix to reduce the dimensionality. The reduced-order SVD is the best k-dimensional approximation to the original matrix. The experiment uses more than fifteen hundreds research paper abstracts from a specific field. Because different factor solutions of the LSA suggest different levels of aggregation, this work examines thirteen solutions in the experiment. The results show that LSA is able to identify not only principle categories, but also major themes contained in the text.

机译：从文本中发现主要主题是一项艰巨的任务，可以更好地理解整个语料库，并且可以将其视为文本分类问题。本文的目的是应用潜在语义分析（LSA）方法来提取代表隐藏在大量文本中的概念的共同因素。 LSA涉及三个步骤：第一步是建立术语文档矩阵;第二步是使用各种加权方案将术语频率转换为术语文档矩阵。第三步，对矩阵执行奇异值分解（SVD）以降低维数。降阶SVD是原始矩阵的最佳k维近似。该实验使用了来自特定领域的超过1500种研究论文摘要。由于LSA的不同因素解决方案建议的聚集程度不同，因此本工作研究了13种解决方案。结果表明，LSA不仅可以识别原则类别，还可以识别文本中包含的主要主题。

著录项

来源
《International journal of computers, communications and control》 |2015年第3期|共页
作者
Gang Kou; Yi Peng;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词
入库时间 2022-08-18 09:15:33

相似文献

外文文献
中文文献
专利

1. An Application of Latent Semantic Analysis for Text Categorization [J] . G. Kou, Y. Peng International journal of computers, communications & control . 2015,第3期

机译：潜在语义分析在文本分类中的应用
2. An Application of Latent Semantic Analysis for Text Categorization [J] . Gang Kou, Yi Peng IAENG Internaitonal journal of computer science . 2015,第3期

机译：潜在语义分析在文本分类中的应用
3. Conceptually categorizing geographic features from text based on latent semantic analysis and ontologies [J] . Yuxia Huang Annals of GIS . 2016,第1a4期

机译：基于潜在语义分析和本体的概念上从文本对地理特征进行分类
4. Latent Semantic Matching: Application to Cross-language Text Categorization without Alignment Information [C] . Tsutomu Hirao, Tomoharu Iwata, Masaaki Nagata Annual meeting of the Association for Computational Linguistics . 2013

机译：潜在语义匹配：在没有对齐信息的情况下应用于跨语言文本分类
5. Discovering latent topical phrases in document collections and networks with text components: Leveraging text mining and information network analysis for human oriented applications. [D] . Danilevsky, Marina Grigoryevna. 2014

机译：在文档集合和带有文本组件的网络中发现潜在的主题短语：利用面向人类的应用程序的文本挖掘和信息网络分析。
6. Enhancing Text Categorization with Semantic-enriched Representation and Training Data Augmentation [O] . Xinghua Lu, Bin Zheng, Atulya Velivelli, 2006

机译：通过丰富的语义表示和训练数据增强来增强文本分类
7. Learning from text: Matching readers and texts by Latent Semantic Analysis [O] . M.E. Schreiner, Bob Rehder, Darrell Laham, 1998

机译：从文本中学习：通过潜在语义分析使读者和文本匹配

An Application of Latent Semantic Analysis for Text Categorization

摘要

著录项

相似文献

相关主题

期刊订阅