【24h】

Retrieval activities in a database consisting of heterogeneous collections of structured text

机译:由结构化文本的异构集合组成的数据库中的检索活动

获取原文

摘要

The first part of this paper briefly describes a mathematical framework (called the containment model) that provides the operations and data structures for a text dominated database with a hierarchical structure. The database is considered to be a hierarchical collection of continuous extents each extent being a word, word phrase, text element or non-text element. The filter operations making up a search command are expressed in terms of containment criteria that specify whether a contiguous extent will be selected or rejected during a search. This formalism, comprised of the mathematical framework and its associated language, defines a conceptual layer upon which we can construct a well-defined higher level layer, specifically the user interface that serves to provide a level of functionality that is closer to the needs of the user and the application domain.

With the conceptual layer established, we go on to describe the design and implementation of a versatile interface which handles queries that search and navigate a heterogeneous collection of structured documents. Interface functionality is provided by a set of "worker" modules supported by an "environment" that is the same for all interfaces. The interface environment allows a worker to communicate with the underlying text retrieval engine using a well-defined command protocol that is based on a small set of filter operators. The overall design emphasizes: a) interface flexibility for a variety of search and browsing capabilities, b) the modular independence of the interface with respect to its underlying retrieval engine, and c) the advantages to be accrued by defining retrieval commands using operators that are part of a text algebra that provides a sound theoretical foundation for the database.

机译:

本文的第一部分简要描述了一个数学框架(称为容纳模型),该框架为具有文本结构的文本控制型数据库提供了操作和数据结构。该数据库被认为是连续范围的层次结构集合,每个范围是一个单词,单词短语,文本元素或非文本元素。构成搜索命令的过滤器操作以包含条件表示,该包含条件指定在搜索过程中是选择还是拒绝连续范围。这种由数学框架及其相关语言组成的形式主义定义了一个概念层,我们可以在此概念层上构建一个定义明确的更高层,特别是用于提供更接近功能需求的功能级别的用户界面。用户和应用程序域。

在建立了概念层之后,我们将继续描述通用接口的设计和实现,该接口可处理用于搜索和导航结构化文档的异构集合的查询。接口功能由“环境”支持的一组“工作程序”模块提供,该模块对于所有接口都是相同的。接口环境允许工作人员使用基于一小组过滤器运算符的定义明确的命令协议与基础文本检索引擎进行通信。总体设计强调:a)各种搜索和浏览功能的界面灵活性,b)相对于其底层检索引擎的界面模块化独立性,以及c)通过使用以下运算符定义检索命令而获得的优势:文本代数的一部分,为数据库提供了良好的理论基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号