首页> 外文会议> >Understanding and Enhancing the Folding-In Method in Latent Semantic Indexing
【24h】

Understanding and Enhancing the Folding-In Method in Latent Semantic Indexing

机译:了解和增强潜在语义索引中的折叠方法

获取原文
获取原文并翻译 | 示例

摘要

Latent Semantic Indexing(LSI) has been proved to be effective to capture the semantic structure of document collections. It is widely used in content-based text retrieval. However, in many real-world applications dealing with very large document collections, LSI suffers from its high computational complexity, which comes from the process of Singular Value Decomposition (SVD). As a result, in practice, the folding-in method is widely used as an approximation to the LSI method. However, in practice, the folding-in method is generally implemented "as is" and detailed analysis on its effectiveness and performance is left out. Consequentially, the performance of the folding-in method cannot be guaranteed. In this paper, we firstly illustrated the underlying principle of the folding-in method from a linear algebra point of view and analyzed some existing commonly used techniques. Based on the theoretical analysis, we proposed a novel algorithm to guide the implementation of the folding-in method. Our method was justified and evaluated by a series of experiments on various classical IR data sets. The results indicated that our method was effective and had consistent performance over different document collections.
机译:潜在语义索引(LSI)已被证明可有效地捕获文档集合的语义结构。它广泛用于基于内容的文本检索中。但是,在许多处理非常大的文档集合的实际应用程序中,LSI遭受高计算复杂度的困扰,这是由于奇异值分解(SVD)过程引起的。结果,实际上,折入法被广泛用作LSI法的近似方法。然而,在实践中,通常采用“原样”实施折入式方法,并且省略了其有效性和性能的详细分析。因此,无法保证折入方法的性能。在本文中,我们首先从线性代数的角度说明了折入法的基本原理,并分析了一些现有的常用技术。在理论分析的基础上,我们提出了一种新颖的算法来指导折入法的实现。通过对各种经典IR数据集进行的一系列实验对我们的方法进行了论证和评估。结果表明我们的方法是有效的,并且在不同的文档集上具有一致的性能。

著录项

  • 来源
    《 》|2006年|104-113|共10页
  • 会议地点 Krakow(PL)
  • 作者

    Xiang Wang; Xiaoming Jin;

  • 作者单位

    School of Software, Tsinghua University, Beijing 100084, China;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 TP311.13;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号