首页> 外文期刊>Pattern recognition letters >Memory-restricted latent semantic analysis to accumulate term-document co-occurrence events
【24h】

Memory-restricted latent semantic analysis to accumulate term-document co-occurrence events

机译:内存受限的潜在语义分析,以积累术语文档共现事件

获取原文
获取原文并翻译 | 示例
       

摘要

This paper addresses a novel adaptive problem of obtaining a new type of term-document weight. In our problem, an input is given by a long sequence of co-occurrence events between terms and documents, namely, a stream of term-document co-occurrence events. Given a stream of term-document co-occurrences, we learn unknown latent vectors of terms and documents such that their inner product adap-tively approximates the target query-based term-document weights resulting from accumulating cooccurrence events. To this end, we propose a new incremental dimensionality reduction algorithm for adaptively learning a latent semantic index of terms and documents over a collection. The core of our algorithm is its partial updating style, where only a small number of latent vectors are modified for each term-document co-occurrence, while most other latent vectors remain unchanged. Experimental results on small and large standard test collections demonstrate that the proposed algorithm can stably learn the latent semantic index of terms and documents, showing an improvement in the retrieval performance over the baseline method.
机译:本文解决了获得新型术语文档权重的新型自适应问题。在我们的问题中,输入是由术语和文档之间的一连串共现事件给出的,即术语-文档共现事件流。给定一个术语文档共现流,我们将学习未知的术语和文档的潜在向量,以使它们的内积自适应地近似于由于累积共生事件而导致的基于目标查询的术语文档权重。为此,我们提出了一种新的增量降维算法,用于自适应地学习集合中术语和文档的潜在语义索引。我们算法的核心是其部分更新样式,其中每次术语文档共现仅修改少量潜在向量,而其他大多数潜在向量则保持不变。在大小标准测试集上的实验结果表明,该算法可以稳定地学习术语和文档的潜在语义索引,与基线方法相比,检索性能有所提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号