首页> 外文期刊>Journal of Computing and Information Technology >On the Performance of Latent Semantic Indexing-based Information Retrieval
【24h】

On the Performance of Latent Semantic Indexing-based Information Retrieval

机译:基于潜在语义索引的信息检索性能

获取原文
获取原文并翻译 | 示例

摘要

Conventional vector-based Information Retrieval (IR) models: Vector Space Model (VSM) and Generalized Vector Space Model (GVSM) represents documents and queries as vectors in a multidimensional space. This high dimensional data places great demands on computing resources. To overcome these problems, Latent Semantic Indexing (LSI), a variant of VSM, projects the documents into a lower dimensional space. It is stated in IR literature that LSI model is 30% more effective than classical VSM models. However, statistical significance tests are required to evaluate the reliability of such comparisons. Focus of this paper is to address this issue. We discuss the tradeoffs of VSM, GVSM, LSI and evaluate the difference in performance on four testing document collections. Then we analyze the statistical significance of these performance differences.
机译:常规的基于向量的信息检索(IR)模型:向量空间模型(VSM)和广义向量空间模型(GVSM)将文档和查询表示为多维空间中的向量。这种高维数据对计算资源提出了很高的要求。为了克服这些问题,潜在语义索引(LSI)是VSM的一种变体,将文档投影到较低维度的空间中。在IR文献中指出,LSI模型比经典VSM模型有效30%。但是,需要进行统计显着性检验才能评估此类比较的可靠性。本文的重点是解决这个问题。我们讨论了VSM,GVSM,LSI的权衡,并评估了四个测试文档集的性能差异。然后,我们分析这些性能差异的统计意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号