首页> 外文学位 >Advancing information retrieval through databases, fusion and information extraction.
【24h】

Advancing information retrieval through databases, fusion and information extraction.

机译:通过数据库,融合和信息提取来推进信息检索。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation investigates improvements to Information Retrieval (IR) In three areas: novel uses of database technology for information retrieval, fusion of Information Retrieval strategies in a common environment, and a new kind of relevance feedback—entity-based feedback.; Entity-Based feedback is a novel technique for identifying query expansion terms. Entities are identified using a commercial extraction tool that tags Person, Organization and Location entities. These are stored in the inverted index along with the terms and phrases. Since queries are typically short, the first pass retrieval uses simply terms and phrases. Entities from the top documents are used alone or along with regular feedback terms and phrases. Experimental results show that entities improved retrieval effectiveness in more queries than not. Further filtering of entities is suggested to eliminate those that do not help.; Combining results from disparate IR systems—fusion—has achieved some success in the past. However, disparate systems vary in many system features so it is unclear what contributes to improvements. This research investigates the effectiveness of fusion within a common environment using vector space, probabilistic, and weighted Boolean strategies. Experiments of several thousand combinations using 150 queries against a collection of 528,155 documents (two gigabytes total) were run. The results indicate that these strategies bring back very similar result sets and do not improve with fusion. Further variations in query representation yielded improvement. When both retrieval strategy and query representation are varied, even further improvement is gained.; Using databases for Information Retrieval permits integration of text searches with searches of structured, database data. This research verifies that relational algebra and standard structured query language support leading probabilistic similarity measures. In addition, this work proposes a schema design using multidimensional database (MDB) and Online Analytic Processing (OLAP) which permits advanced, interactive analysis of document collections. Items such as publisher, location, organizations, persons, etc. are pulled from the text and stored for searching, Additional structured data such as corporate or public databases may be linked in. Using OLAP tools, the end-user interactively explores both text and structured data, seamlessly moving through the documents.
机译:本文在三个方面研究了信息检索(IR)的改进:数据库技术在信息检索中的新应用,在通用环境中融合信息检索策略以及一种新的相关性反馈(基于实体的反馈)。基于实体的反馈是一种用于标识查询扩展项的新颖技术。使用标记个人,组织和位置实体的商业提取工具来标识实体。它们与术语和短语一起存储在倒排索引中。由于查询通常很短,因此第一遍检索仅使用术语和短语。顶部文档中的实体可以单独使用,也可以与常规反馈术语和短语一起使用。实验结果表明,实体在更多查询中提高了检索效率。建议进一步过滤实体,以消除无用的实体。过去,将来自不同的红外系统(融合)的结果结合在一起已经取得了一些成功。但是,不同的系统在许多系统功能上有所不同,因此尚不清楚是什么有助于改进。这项研究使用向量空间,概率和加权布尔策略研究了在公共环境中融合的有效性。对528,155个文档(共2 GB)的集合使用150个查询进行了数千种组合的实验。结果表明,这些策略带来了非常相似的结果集,并且无法随着融合而改善。查询表示形式的进一步变化产生了改进。当检索策略和查询表示都发生变化时,甚至可以得到进一步的改进。使用数据库进行信息检索可以将文本搜索与结构化的数据库数据搜索集成在一起。这项研究验证了关系代数和标准结构化查询语言是否支持领先的概率相似性度量。此外,这项工作还提出了使用多维数据库(MDB)和在线分析处理(OLAP)进行架构设计的方法,该方案允许对文档集合进行高级的交互式分析。从文本中提取诸如发布者,位置,组织,人员等项目并存储以供搜索。可以链接其他结构化数据(例如公司或公共数据库)。使用OLAP工具,最终用户可以交互式地浏览文本和结构化数据,在文档之间无缝移动。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号