【24h】

Complex Queries over Web Repositories

机译:Web存储库上的复杂查询

获取原文
获取原文并翻译 | 示例

摘要

Web repositories, such as the Stanford WebBase repository, manage large heterogeneous collections of Web pages and associated indexes. For effective analysis and mining, these repositories must provide a declarative query interface that supports complex expressive Web queries. Such queries have two key characteristics: (ⅰ) They view a Web repository simultaneously as a collection of text documents, as a navigable directed graph, and as a set of relational tables storing properties of Web pages (length, URL, title, etc.). (ⅱ) The queries employ application-specific ranking and ordering relationships over pages and links to filter out and retrieve only the "best" query results. In this paper, we model a Web repository in terms of "Web relations" and describe an algebra for expressing complex Web queries. Our algebra extends traditional relational operators as well as graph navigation operators to uniformly handle plain, ranked, and ordered Web relations. In addition, we present an overview of the cost-based optimizer and execution engine that we have developed, to efficiently execute Web queries over large repositories.
机译:Web资料库,例如Stanford WebBase资料库,管理着大量异构的Web页面集和相关索引。为了进行有效的分析和挖掘,这些存储库必须提供一个声明性查询接口,以支持复杂的表达性Web查询。这样的查询具有两个关键特征:(ⅰ)它们同时将Web存储库视为文本文档的集合,可导航的有向图以及一组存储Web页属性(长度,URL,标题等)的关系表。 )。 (ⅱ)查询在页面和链接上使用特定于应用程序的排名和排序关系,以仅过滤和检索“最佳”查询结果。在本文中,我们根据“ Web关系”对Web存储库进行建模,并描述了表示复杂Web查询的代数。我们的代数扩展了传统的关系运算符以及图形导航运算符,以统一处理普通,排名和有序的Web关系。此外,我们还概述了我们开发的基于成本的优化器和执行引擎,以在大型存储库上有效执行Web查询。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号