首页> 美国卫生研究院文献>Computational Intelligence and Neuroscience >Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure
【2h】

Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure

机译:在群集上使用SVD提高文档间相似性度量的精度

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods.
机译:近年来,为了克服传统词法匹配中的多义和同名问题,提出了基于SVD(奇异值分解)的LSI(潜在语义索引)。但是,尽管它已经被证明具有良好的代表质量,但通常被批评为代表文档的辨别力低。本文提出了基于簇的SVD算法,以提高LSI的鉴别能力。本文的贡献是三个流形。首先,我们对LSI现有的线性代数方法进行了调查,包括基于SVD的方法和基于非SVD的方法。其次,我们提出了针对LSI的集群上的SVD并从理论上解释了文档向量的维扩展和使用SVD的维投影是集群上SVD涉及的两种操作。此外,我们开发了更新过程,以在群集上通过SVD在分解后的矩阵中折叠新的文档和术语。第三,使用两个语料库,一个中文语料库和一个英语语料库来评估所提出方法的性能。实验表明,与其他基于SVD的LSI方法相比,集群上的SVD在某种程度上可以提高文档间相似性度量的精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号