首页> 外文学位 >Information retrieval via universal source coding.
【24h】

Information retrieval via universal source coding.

机译:通过通用源代码进行信息检索。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation explores the intersection of information retrieval and universal source coding techniques and studies an optimal multidimensional source representation from an information theoretic point of view. Previous research on information retrieval particularly focus on learning probabilistic or deterministic source models based on primarily two different types of source representations, e.g., fixed-shape partitions or uniform regions. We study the limitations of the conventional source representations on capturing the semantics of the given multidimensional source sequences and propose a new type of primitive source representation generated by a universal source coding technique. We propose a multidimensional incremental parsing algorithm extended from the Lempel-Ziv incremental parsing and its three component schemes for multidimensional source coding. The properties of the proposed coding algorithm are exploited under two-dimensional lossless and lossy source coding. By the proposed coding algorithm, a given multidimensional source sequence is parsed into a number of variable-size patches. We call this methodology a parsed representation.;Based on the source representation, we propose an information retrieval framework that analyzes a set of source sequences under a linguistic processing technique and implemented content-based image retrieval systems. We examine the relevance of the proposed source representation by comparing it with the conventional representation of visual information. To further extend the proposed framework, we apply a probabilistic linguistic processing technique to modeling the latent aspects of a set of documents. In addition, beyond the symbol-wise pattern matching paradigm employed in the source coding and the image retrieval systems, we devise a robust pattern matching that compares the first- and second-order statistics of source patches. Qualitative and quantitative analysis of the proposed framework justifies the superiority of the proposed information retrieval framework based on the parsed representation. The proposed source representation technique and the information retrieval frameworks encourage future work in exploiting a systematic way of understanding multidimensional sources that parallels a linguistic structure.
机译:本文探讨了信息检索与通用源编码技术的交叉,并从信息理论的角度研究了最佳的多维源表示。先前关于信息检索的研究特别关注于主要基于两种不同类型的源表示(例如,固定形状的分区或统一区域)来学习概率或确定性源模型。我们研究了常规源表示形式在捕获给定多维源序列的语义方面的局限性,并提出了一种由通用源编码技术生成的新型原始源表示形式。我们提出了从Lempel-Ziv增量分析及其多维数据源编码的三个组件方案扩展而来的多维增量分析算法。在二维无损和有损源编码下,利用所提出的编码算法的性质。通过提出的编码算法,将给定的多维源序列解析为多个可变大小的补丁。我们称这种方法为解析表示。基于源表示,我们提出了一种信息检索框架,该框架利用语言处理技术分析了一组源序列,并实现了基于内容的图像检索系统。我们通过将其与视觉信息的常规表示形式进行比较来检验所提议的源表示形式的相关性。为了进一步扩展提议的框架,我们应用了一种概率语言处理技术来对一组文档的潜在方面进行建模。此外,除了在源代码编码和图像检索系统中采用的按符号方式的模式匹配范式之外,我们还设计了一种鲁棒的模式匹配,可比较源补丁的一阶和二阶统计量。对所提出框架的定性和定量分析证明了所提出的基于解析表示的信息检索框架的优越性。所提出的源表示技术和信息检索框架鼓励了未来的工作,即利用一种理解类似于语言结构的多维源的系统方式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号