首页> 外文期刊>ACM transactions on database systems >A Declarative Approach to Optimize Bulk Loading into Databases
【24h】

A Declarative Approach to Optimize Bulk Loading into Databases

机译:一种声明性方法,可优化向数据库的批量加载

获取原文
获取原文并翻译 | 示例

摘要

Applications, such as warehouse maintenance, need to load large data volumes regularly. The efficiency of loading depends on the resources that are available at the source and at the target systems. Our work aims to understand the performance criteria that are involved in bulk loading data into a database and to devise tailored optimization strategies. Unlike commercial systems and previous research on the same topic, our approach follows the fundamental database principle of physical-logical independence. A loading program is represented as a sequence of algebraic expressions. This abstraction enables the use of appropriate algebraic rewritings to optimize a loading program and of a cost model that takes into consideration efficiency criteria such as the processing times at the source and target systems and the bandwidth between them. A slow-loading program may be preferable if it does not slow down other applications by consuming too much memory. Thus, we view the problem of optimizing a loading program as finding a compromise between several efficiency criteria. The ability to represent loading programs in an algebra and performance criteria in a cost model has two very desirable properties: reusability and efficiency. Database programmers do not have to write loading programs by hand. In addition, tuning loading programs becomes easier since programmers have a better control on the performance criteria specified in the cost model. The algebra captures data transformations that would have been otherwise hardcoded in loading programs. Consequently, richer optimizations can be explored. Finally, our optimization techniques are not specific to one particular system. They can be used for loading data and from to any structured store (e.g., relational, structured files). We implemented our ideas in a complete environment for migrating ODBC-compliant databases into the O_2 object-oriented database system. This prototype provides a declarative view language to specify loading, an interface to specify directives, such as desired database physical organization and constraints on several criteria, such as resource and bandwidth consumption, an algebraic optimizer, a code generator, and an execution environment to control failures and guarantee incremental loading. Our experiments show that a tailored optimization is necessary when loading large data volumes into a database.
机译:仓库维护等应用程序需要定期加载大量数据。加载的效率取决于源和目标系统上可用的资源。我们的工作旨在了解将数据批量加载到数据库中涉及的性能标准,并设计量身定制的优化策略。与商业系统和先前针对同一主题的研究不同,我们的方法遵循物理上独立性的基本数据库原理。加载程序表示为一系列代数表达式。这种抽象使得可以使用适当的代数重写来优化加载程序和成本模型,其中考虑了效率标准,例如源系统和目标系统的处理时间以及它们之间的带宽。如果缓慢加载程序不会通过占用过多内存来减慢其他应用程序的速度,则可能是更可取的。因此,我们认为优化装载程序的问题是在几个效率标准之间找到折衷方案。在代数中表示加载程序的能力以及在成本模型中表示性能的能力具有两个非常理想的属性:可重用性和效率。数据库程序员不必手动编写加载程序。此外,由于程序员可以更好地控制成本模型中指定的性能标准,因此调优加载程序变得更加容易。代数捕获了本来应该在加载程序中进行硬编码的数据转换。因此,可以探索更丰富的优化。最后,我们的优化技术并不特定于一个特定的系统。它们可用于加载数据以及从中加载数据到任何结构化存储(例如关系,结构化文件)。我们在一个完整的环境中实现了我们的想法,以便将符合ODBC的数据库迁移到O_2面向对象的数据库系统中。该原型提供了一种声明式视图语言来指定加载,一个接口来指定指令(例如所需的数据库物理组织和对多个标准的约束(例如资源和带宽消耗),代数优化器,代码生成器以及要控制的执行环境)故障并保证增量负载。我们的实验表明,在将大量数据加载到数据库中时,有必要进行量身定制的优化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号