Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning

机译：预测查询的多个指标：机器学习可实现更好的决策

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the most challenging aspects of managing a very large data warehouse is identifying how queries will behave before they start executing. Yet knowing their performance characteristics --- their runtimes and resource usage --- can solve two important problems. First, every database vendor struggles with managing unexpectedly long-running queries. When these long-running queries can be identified before they start, they can be rejected or scheduled when they will not cause extreme resource contention for the other queries in the system. Second, deciding whether a system can complete a given workload in a given time period (or a bigger system is necessary) depends on knowing the resource requirements of the queries in that workload. We have developed a system that uses machine learning to accurately predict the performance metrics of database queries whose execution times range from milliseconds to hours. For training and testing our system, we used both real customer queries and queries generated from an extended set of TPC-DS templates. The extensions mimic queries that caused customer problems. We used these queries to compare how accurately different techniques predict metrics such as elapsed time, records used, disk I/Os, and message bytes. The most promising technique was not only the most accurate, but also predicted these metrics simultaneously and using only information available prior to query execution. We validated the accuracy of this machine learning technique on a number of HP Neoview configurations. We were able to predict individual query elapsed time within 20% of its actual time for 85% of the test queries. Most importantly, we were able to correctly identify both the short and long-running (up to two hour) queries to inform workload management and capacity planning.

机译：管理大型数据仓库最具挑战性的方面之一是在查询开始执行之前确定其行为。然而，了解它们的性能特征-它们的运行时和资源使用情况-可以解决两个重要的问题。首先，每个数据库供应商都在努力管理意想不到的长时间运行的查询。当这些长时间运行的查询可以在开始之前被识别时，如果它们不会引起系统中其他查询的极端资源争用，则可以拒绝或计划这些查询。其次，确定系统是否可以在给定的时间段内完成给定的工作负载（或需要更大的系统）取决于了解该工作负载中查询的资源需求。我们已经开发了一种系统，该系统使用机器学习来准确预测执行时间从毫秒到几小时不等的数据库查询的性能指标。为了培训和测试我们的系统，我们同时使用了真实的客户查询和从扩展的TPC-DS模板集生成的查询。这些扩展模仿引起客户问题的查询。我们使用这些查询来比较不同技术预测指标的准确性，例如经过时间，使用的记录，磁盘I / O和消息字节。最有前途的技术不仅最准确，而且可以同时预测这些指标，并且仅使用查询执行之前可用的信息。我们在许多HP Neoview配置上验证了该机器学习技术的准确性。对于85％的测试查询，我们能够在其实际时间的20％范围内预测单个查询的经过时间。最重要的是，我们能够正确识别短期和长期（最多两个小时）查询，以告知工作负载管理和容量规划。

著录项

来源
《Data Engineering, ICDE, 2009 IEEE 25th International Conference on》|2009年|P.592-603|共12页
会议地点
作者
Ganapathi; Archana; Kuno; Harumi; Dayal; Umeshwar; Wiener; Janet L.; Fox; Armando; Jordan; Michael; Patterson; David;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类工业技术;
关键词
database performance prediction; machine learning; operational business intelligence;

机译：数据库性能预测;机器学习;业务智能;

相似文献

外文文献
中文文献
专利

1. Multiple metric learning with query adaptive weights and multi-task re-weighting for person re-identification [J] . Jieru Jia, Qiuqi Ruan, Gaoyun An, Computer vision and image understanding . 2017,第jula期

机译：具有查询自适应权重和多任务重新加权的多度量学习，用于人员重新识别
2. Learning Distance Metric for Support Vector Machine: A Multiple Kernel Learning Approach [J] . Zhang Weiqi, Yan Zifei, Xiao Gang, Neural processing letters . 2019,第3期

机译：支持向量机的学习距离度量：多个内核学习方法
3. Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets [J] . Korotcov Alexandru, Tkachenko Valery, Russo Daniel P., Molecular pharmaceutics . 2017,第12期

机译：使用不同药物发现数据集的多机学习方法和度量的深度学习比较
4. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning [C] . Ganapathi Archana, Kuno Harumi, Dayal Umeshwar, IEEE International Conference on Data Engineering . 2009

机译：预测查询的多个指标：机器学习使能更好的决定
5. Development and Identification of Metrics to Predict the Impact of Dimension Reduction Techniques on Classical Machine Learning Algorithms for Still Highway Images [D] . Khan, Wasim Akram. 2020

机译：指标的发展与识别预测尺寸减少技术对静态机器学习算法的影响
6. Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Datasets [O] . Alexandru Korotcov, Valery Tkachenko, Daniel P Russo, -1

机译：使用多种药物发现数据集将深度学习与多种机器学习方法和指标进行比较
7. Predicting multiple metrics for queries: Better decisions enabled by machine learning [O] . Archana Ganapathi, Harumi Kuno, Umeshwar Dayal, 2009

机译：预测查询的多个指标：通过机器学习实现更好的决策

Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning

摘要

著录项

相似文献

相关主题

期刊订阅