首页> 外文期刊>Theoretical computer science >Tight lower bounds for query processing on streaming and external memory data
【24h】

Tight lower bounds for query processing on streaming and external memory data

机译:对流和外部存储器数据进行查询处理的下限很小

获取原文
获取原文并翻译 | 示例
           

摘要

It is generally assumed that databases have to reside in external, inexpensive storage because of their sheer size. Current technology for external storage systems presents us with a reality that, performance-wise, a small number of sequential scans of the data is strictly preferable over random data accesses. Database technology–in particular query processing technology–has developed around a notion of memory hierarchies with layers of greatly varying sizes and access times. It seems that the current technologies scale up to their tasks and are very successful, but on closer investigation it may appear that our theoretical understanding of the problems involved–and of optimal algorithms for these problems–is not quite as developed. Recently, data stream processing has become an object of study by the database management community, but from the viewpoint of database theory, this is really a special case of the query processing problem on data in external storage where we are limited to a single scan of the input data. In the present paper we study a clean machine model for external memory and stream processing. We establish tight bounds for the data complexity of Core XPath evaluation and filtering. We show that the number of scans of the external data induces a strict hierarchy (as long as internal memory space is sufficiently small, e.g., polylogarithmic in the size of the input). We also show that neither joins nor sorting are feasible if the product of the number r(n) of scans of the external memory and the size s(n) of the internal memory buffers is sufficiently small, i.e., of size o(n).
机译:通常认为,由于数据库庞大,因此必须驻留在外部廉价存储中。当前的外部存储系统技术为我们提供了一个现实,即从性能角度来看,少量顺序扫描数据绝对比随机数据访问更可取。数据库技术,尤其是查询处理技术,是围绕具有不同大小和访问时间的层的内存层次结构的概念而开发的。看起来当前的技术可以扩展到它们的任务上并且非常成功,但是在更深入的研究中,似乎我们对所涉及问题的理论理解以及对这些问题的最佳算法的理解还不够完善。最近,数据流处理已成为数据库管理社区研究的对象,但是从数据库理论的角度来看,这确实是外部存储中数据的查询处理问题的特例,我们仅限于一次扫描输入数据。在本文中,我们研究了用于外部存储器和流处理的干净机器模型。我们为Core XPath评估和过滤的数据复杂性建立了严格的界限。我们表明,外部数据的扫描次数会导致严格的层次结构(只要内部存储空间足够小,例如输入大小为多对数)。我们还表明,如果外部存储器的扫描次数r(n)与内部存储器缓冲区的大小s(n)的乘积足够小,即大小为o(n),则联接和排序都不可行。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号