首页> 外文学位 >On improving information retrieval performance from structured, semi-structured and un-structured information sources.
【24h】

On improving information retrieval performance from structured, semi-structured and un-structured information sources.

机译:关于提高结构化,半结构化和非结构化信息源的信息检索性能。

获取原文
获取原文并翻译 | 示例

摘要

The field of unstructured data retrieval for simple data types such as text and structured data retrieval in relational data models for transactional processing has already been well researched and commercially developed. However, more complex data types and models such as XML (as semi-structured data), data warehouses (as structured data), images (as unstructured data), etc. pose additional research challenges. The goal of this work is to address such information retrieval performance issues and challenges.; As XML is an evolving semi-structured data representation format, techniques for indexing and retrieval of XML data are drawing increasing attention. We have proposed a memory-efficient index structure and an efficient algorithm for incremental indexing of XML document collections. The experimental results show that our proposed index structure outperforms earlier schemes in terms of indexing time and storage requirements.; Given the growth in size of image collections over the last few years, Content-Based Image Retrieval (CBIR) systems are required to effectively and efficiently access images using information contained in them. Perception-based image retrieval, on the other hand, plays an important role in overcoming some of the semantic problems associated with CBIR. We have proposed a method that uses the concept of Inverse Image Frequency for perception-based color image quantization to improve traditional quantization schemes. Additionally, a cluster-based approach for efficient CBIR that uses a similarity-preserving space transformation method is proposed. Our results show that it offers superior response time with sufficiently high retrieval accuracy.; Lastly, for improving online analytical processing, our focus has been on the more challenging and evolving multidimensional data model. Earlier work does not completely address performance issues, such as query response time and view maintenance time, in data warehouses. We propose a hybrid approach for the selection of views that combines the improved response time of the static approach and the automated tuning capability of the dynamic approach. Experimental results show that the hybrid approach outperforms both the static and the dynamic approaches to view selection.; For future work, we suggest the integration of our results in these different areas and the evaluation of their applicability to real-life multimodal systems applications.
机译:对于简单数据类型(如文本)的非结构化数据检索领域以及用于事务处理的关系数据模型中的结构化数据检索领域,已经进行了充分的研究和商业开发。但是,更复杂的数据类型和模型(例如XML(作为半结构化数据),数据仓库(作为结构化数据),图像(作为非结构化数据)等)提出了额外的研究挑战。这项工作的目标是解决此类信息检索性能问题和挑战。由于XML是一种发展中的半结构化数据表示格式,因此XML数据的索引和检索技术引起了越来越多的关注。我们提出了一种内存有效的索引结构和一种用于XML文档集合增量索引的有效算法。实验结果表明,在索引时间和存储要求方面,我们提出的索引结构优于早期方案。鉴于最近几年来图像收藏的规模不断增长,需要基于内容的图像检索(CBIR)系统,以使用其中包含的信息来有效地访问图像。另一方面,基于感知的图像检索在克服与CBIR相关的一些语义问题中起着重要作用。我们提出了一种方法,该方法使用逆图像频率的概念进行基于感知的彩色图像量化,以改进传统的量化方案。此外,提出了一种基于簇的有效CBIR方法,该方法使用了一种保留相似性的空间变换方法。我们的结果表明,它提供了出色的响应时间以及足够高的检索精度。最后,为了改善在线分析处理,我们的重点一直放在更具挑战性和不断发展的多维数据模型上。早期的工作不能完全解决数据仓库中的性能问题,例如查询响应时间和视图维护时间。我们提出了一种用于选择视图的混合方法,该方法结合了静态方法的改进响应时间和动态方法的自动调整功能。实验结果表明,混合方法优于静态和动态方法进行视图选择。对于将来的工作,我们建议将我们的结果整合到这些不同领域中,并对它们在现实多模态系统应用中的适用性进行评估。

著录项

  • 作者

    Shah, Biren N.;

  • 作者单位

    University of Louisiana at Lafayette.;

  • 授予单位 University of Louisiana at Lafayette.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 183 p.
  • 总页数 183
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号