首页> 外文会议>Data Engineering, ICDE, 2009 IEEE 25th International Conference on >Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning
【24h】

Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning

机译:预测查询的多个指标:机器学习可实现更好的决策

获取原文

摘要

One of the most challenging aspects of managing a very large data warehouse is identifying how queries will behave before they start executing. Yet knowing their performance characteristics --- their runtimes and resource usage --- can solve two important problems. First, every database vendor struggles with managing unexpectedly long-running queries. When these long-running queries can be identified before they start, they can be rejected or scheduled when they will not cause extreme resource contention for the other queries in the system. Second, deciding whether a system can complete a given workload in a given time period (or a bigger system is necessary) depends on knowing the resource requirements of the queries in that workload. We have developed a system that uses machine learning to accurately predict the performance metrics of database queries whose execution times range from milliseconds to hours. For training and testing our system, we used both real customer queries and queries generated from an extended set of TPC-DS templates. The extensions mimic queries that caused customer problems. We used these queries to compare how accurately different techniques predict metrics such as elapsed time, records used, disk I/Os, and message bytes. The most promising technique was not only the most accurate, but also predicted these metrics simultaneously and using only information available prior to query execution. We validated the accuracy of this machine learning technique on a number of HP Neoview configurations. We were able to predict individual query elapsed time within 20% of its actual time for 85% of the test queries. Most importantly, we were able to correctly identify both the short and long-running (up to two hour) queries to inform workload management and capacity planning.
机译:管理大型数据仓库最具挑战性的方面之一是在查询开始执行之前确定其行为。然而,了解它们的性能特征-它们的运行时和资源使用情况-可以解决两个重要的问题。首先,每个数据库供应商都在努力管理意想不到的长时间运行的查询。当这些长时间运行的查询可以在开始之前被识别时,如果它们不会引起系统中其他查询的极端资源争用,则可以拒绝或计划这些查询。其次,确定系统是否可以在给定的时间段内完成给定的工作负载(或需要更大的系统)取决于了解该工作负载中查询的资源需求。我们已经开发了一种系统,该系统使用机器学习来准确预测执行时间从毫秒到几小时不等的数据库查询的性能指标。为了培训和测试我们的系统,我们同时使用了真实的客户查询和从扩展的TPC-DS模板集生成的查询。这些扩展模仿引起客户问题的查询。我们使用这些查询来比较不同技术预测指标的准确性,例如经过时间,使用的记录,磁盘I / O和消息字节。最有前途的技术不仅最准确,而且可以同时预测这些指标,并且仅使用查询执行之前可用的信息。我们在许多HP Neoview配置上验证了该机器学习技术的准确性。对于85%的测试查询,我们能够在其实际时间的20%范围内预测单个查询的经过时间。最重要的是,我们能够正确识别短期和长期(最多两个小时)查询,以告知工作负载管理和容量规划。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号