首页> 外文学位 >Integrating text search and relational databases: Functionality and performance.
【24h】

Integrating text search and relational databases: Functionality and performance.

机译:集成文本搜索和关系数据库:功能和性能。

获取原文
获取原文并翻译 | 示例

摘要

Applications increasingly involve a mix of free-text documents and traditional relational tables [46]. Commercial relational database management system (RDBMS) store both types of data and support access through keyword search, traditional relational operators in SQL, or a mixed query that combines both. However, application developers lack tools that address functionality and performance concerns that are available for traditional, scalar data, but needed when integrating keyword search in an RDBMS. With regards to functionality, this thesis proposes TextViews as a fully declarative way to specify virtual collections of virtual documents for use with keyword search. For performance, this thesis proposes TEXTURE, a benchmark for comparing RDBMSs given a workload of mixed queries.;Current RDBMSs store a document as a single attribute value and a single collection in a table. TextViews are an adaptation of relational views for defining documents that are composed of multiple documents, possibly stored in multiple tables. Such documents are grouped into a collection and ranked using keyword search. Keyword search can be evaluated by either materializing the TextView, then searching, or by using inverted indexes built on the base table. Inverted indexes do not take advantage of the scalar attributes used in selection and grouping operations that are specified in TextView definitions. Consequently, we propose several alternative indexes for which we demonstrate an order of magnitude improvement in response time for keyword search, with a modest increase in storage when compared to inverted indexes.;The TEXTURE benchmark [28] compares RDBMSs by measuring the response time needed to evaluate a workload of mixed queries. A micro-benchmark design is used to allow fine-grained control for specifying the query workload and data set. In order to support database scale up experiments, TextGen, a novel synthetic text generator was developed and evaluated. TextGen is unique in that it is capable of accurately scaling up an input "seed" text collection, while preserving important data characteristics. The TEXTURE benchmark was used to evaluate three commercial RDBMSs, demonstrating large differences between them for a variety of workloads.
机译:应用程序越来越多地包含自由文本文档和传统关系表的混合[46]。商业关系数据库管理系统(RDBMS)可以存储两种类型的数据,并通过关键字搜索,SQL中的传统关系运算符或结合了两者的混合查询来支持访问。但是,应用程序开发人员缺乏能够解决传统标量数据可用的功能和性能问题的工具,但是在将关键字搜索集成到RDBMS中时却需要这些工具。关于功能,本文提出TextViews作为一种完全声明性的方式来指定用于关键字搜索的虚拟文档的虚拟集合。为了提高性能,本文提出了TEXTURE,这是在给定混合查询工作量的情况下比较RDBMS的基准。当前的RDBMS将文档存储为单个属性值和单个集合在表中。 TextView是关系视图的一种改编,用于定义由多个文档(可能存储在多个表中)组成的文档。此类文档被分组为一个集合,并使用关键字搜索进行排名。可以通过实例化TextView然后进行搜索,或者使用在基表上建立的反向索引来评估关键字搜索。倒排索引不利用TextView定义中指定的选择和分组操作中使用的标量属性。因此,我们提出了几种可供选择的索引,针对这些索引,我们证明了关键词搜索的响应时间提高了一个数量级,与反向索引相比,其存储量有所增加。; TEXTURE基准测试[28]通过测量所需的响应时间来比较RDBMS评估混合查询的工作量。使用微基准设计可进行细粒度控制,以指定查询工作负载和数据集。为了支持数据库扩展实验,开发并评估了新型合成文本生成器TextGen。 TextGen的独特之处在于,它能够准确扩大输入的“种子”文本集合,同时保留重要的数据特征。 TEXTURE基准用于评估三种商业RDBMS,表明它们在各种工作负载之间的巨大差异。

著录项

  • 作者

    Ercegovac, Vuk.;

  • 作者单位

    The University of Wisconsin - Madison.;

  • 授予单位 The University of Wisconsin - Madison.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 135 p.
  • 总页数 135
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号