首页> 外文学位 >Phrase-based vector space model in document retrieval.
【24h】

Phrase-based vector space model in document retrieval.

机译:文档检索中基于短语的向量空间模型。

获取原文
获取原文并翻译 | 示例

摘要

With the advent of the Internet and the World Wide Web, information distribution has become more convenient than ever. However, such an unprecedented abundance of information makes the location of a specific piece of information ever more difficult. Since most of the current search targets are text documents, we study the effective retrieval of text documents in this research.; Many document retrieval systems are based on the vector space model that represents a document as a vector of index terms. Concepts have been proposed to replace word stems as the index terms to improve the retrieval effectiveness. However, past research revealed that such system did not outperform the traditional stem-based systems. Incorporating conceptual similarity derived from knowledge sources should have the potential to improve retrieval effectiveness. Yet the incompleteness of the knowledge sources precludes significant improvement. To remedy this problem, we propose to represent documents using phrases. A phrase consists of a concept and several word stems. The similarity between two phrases is jointly determined by their conceptual similarity and their common word stems. The document similarity can in turn be derived from the phrase similarities.; We demonstrate that the phrase-based vector space model is more effective in document retrieval than the traditional stem-based vector space model. Significant effectiveness improvements are observed in both the exhaustive search and a cluster-based retrieval. We also show that such significant increase in retrieval effectiveness can be achieved without sacrificing too much efficiency.
机译:随着Internet和Internet的出现,信息分发比以往任何时候都更加方便。但是,如此空前的信息量使特定信息的定位变得更加困难。由于当前大多数搜索目标是文本文档,因此在本研究中我们将研究文本文档的有效检索。许多文档检索系统都基于向量空间模型,该向量空间模型将文档表示为索引项的向量。已经提出了代替词干作为索引词的概念,以提高检索效率。但是,过去的研究表明,这种系统并没有优于传统的基于词干的系统。整合从知识源中获得的概念相似性应该具有提高检索效率的潜力。但是,知识源的不完整妨碍了重大改进。为了解决这个问题,我们建议使用短语来表示文档。短语由一个概念和几个词干组成。两个短语之间的相似性共同取决于它们的概念相似性和共同词干。文件相似性又可以从短语相似性中得出。我们证明基于短语的向量空间模型在文档检索中比传统的基于词干的向量空间模型更有效。在详尽搜索和基于聚类的检索中都观察到了显着的有效性提高。我们还表明,可以在不牺牲太多效率的情况下实现检索效率的显着提高。

著录项

  • 作者

    Mao, Wenlei.;

  • 作者单位

    University of California, Los Angeles.;

  • 授予单位 University of California, Los Angeles.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 167 p.
  • 总页数 167
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号