首页> 外文会议>Data Engineering Workshops (ICDEW), 2010 >Vertical partitioning of relational OLTP databases using integer programming
【24h】

Vertical partitioning of relational OLTP databases using integer programming

机译:使用整数编程对关系OLTP数据库进行垂直分区

获取原文

摘要

A way to optimize performance of relational row store databases is to reduce the row widths by vertically partitioning tables into table fractions in order to minimize the number of irrelevant columns/attributes read by each transaction. This paper considers vertical partitioning algorithms for relational row-store OLTP databases with an H-store-like architecture, meaning that we would like to maximize the number of single-sited transactions. We present a model for the vertical partitioning problem that, given a schema together with a vertical partitioning and a workload, estimates the costs (bytes read/written by storage layer access methods and bytes transferred between sites) of evaluating the workload on the given partitioning. The cost model allows for arbitrarily prioritizing load balancing of sites vs. total cost minimization. We show that finding a minimum-cost vertical partitioning in this model is NP-hard and therefore the problem should obviously not be solved manually by a human DBA. We present two algorithms returning solutions in which single-sitedness of read queries is preserved while allowing column replication (which may allow a drastically reduced cost compared to disjoint partitioning). The first algorithm is a quadratic integer program that finds optimal minimum-cost solutions with respect to the model, and the second algorithm is a more scalable heuristic based on simulated annealing. Experiments show that the algorithms can reduce the cost of the model objective by 37% when applied to the TPC-C benchmark and the heuristic is shown to obtain solutions with costs close to the ones found using the quadratic program.
机译:优化关系行存储数据库性能的一种方法是通过将表垂直划分为表部分来减小行宽,以最小化每个事务读取的无关列/属性的数量。本文考虑了具有类似H商店架构的关系行存储OLTP数据库的垂直分区算法,这意味着我们希望最大化单站点事务的数量。我们为垂直分区问题提供了一个模型,该模型在给定架构以及垂直分区和工作负载的情况下,估计了在给定分区上评估工作负载的成本(存储层访问方法读取/写入的字节数以及站点之间传输的字节数) 。成本模型允许任意优先考虑站点的负载平衡和总成本的最小化。我们表明,在此模型中找到最低成本的垂直分区是NP困难的,因此,显然不应由人类DBA手动解决该问题。我们提出了两种算法返回的解决方案,其中保留了读取查询的单位置,同时允许进行列复制(与不相交的分区相比,可以大大降低成本)。第一种算法是二次整数程序,可找到关于模型的最佳最小成本解决方案,第二种算法是基于模拟退火的更具可扩展性的启发式算法。实验表明,该算法在应用于TPC-C基准测试时,可以将模型目标的成本降低37%,并且启发式算法显示出所获得的解决方案的成本接近于使用二次程序找到的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号