...
首页> 外文期刊>Neural processing letters >LSISOM - A Latent Semantic Indexing Approach to Self-Organizing Maps of Document Collections
【24h】

LSISOM - A Latent Semantic Indexing Approach to Self-Organizing Maps of Document Collections

机译:LSISOM-一种潜在的语义索引方法,用于自组织文档集合图

获取原文
获取原文并翻译 | 示例
           

摘要

The Self Organizing Map (SOM) algorithm has been utilized, with much success, in a variety of applications for the automatic organization of full-text document collections. A great advantage of the SOM method is that document collections can be ordered in such a way so that documents with similar content are positioned at nearby locations of the 2-dimen-sional SOM lattice. The resulting ordered map thus presents a general view of the document collection which helps the exploration of information contained in the whole document space. The most notable example of such an application is the WEBSOM method where the document collection is ordered onto a map by utilizing word category histograms for representing the documents data vectors. In this paper, we introduce the LSISOM method which resembles WEBSOM in the sense that the document maps are generated from word category histograms rather than simple histograms of the words. However, a major difference between the two methods is that in WEBSOM the word category histograms are formed using statistical information of short word contexts whereas in LSISOM these histograms are obtained from the SOM clustering of the Latent Semantic Indexing representation of document terms.
机译:自组织图(SOM)算法已在各种应用程序中用于全文本文档集合的自动组织,并获得了成功。 SOM方法的一大优点是可以按以下方式对文档集合进行排序:将具有相似内容的文档放置在二维SOM晶格的附近位置。因此,生成的有序图呈现了文档集合的总体视图,这有助于探索整个文档空间中包含的信息。这种应用程序最著名的例子是WEBSOM方法,其中通过利用单词类别直方图表示文档数据向量,将文档集合排序到地图上。在本文中,我们介绍一种类似于WEBSOM的LSISOM方法,因为该文档图是根据单词类别直方图而不是单词的简单直方图生成的。但是,这两种方法之间的主要区别在于,在WEBSOM中,单词类别直方图是使用短单词上下文的统计信息形成的,而在LSISOM中,这些直方图是从文档项的潜在语义索引表示的SOM聚类获得的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号