首页> 外文会议>IEEE International Conference on Cloud Computing >Towards Selecting Best Combination of SQL-on-Hadoop Systems and JVMs
【24h】

Towards Selecting Best Combination of SQL-on-Hadoop Systems and JVMs

机译:努力选择SQL-on-Hadoop系统和JVM的最佳组合

获取原文

摘要

While Hadoop is the de facto standard big-data middleware, many frameworks have been developed on top of it. Since many SQL-on-Hadoop systems are available, we often consider which engine is best for our queries. We can choose not only query engines but also Java virtual machines (JVMs) as well. As their systems become more complex, however, it is not always true that a single system performs best at any time. Moreover, the performance of a mismatched system may degrade greatly. To exploit the best performance, it is important to know what type of queries are suitable for a system and then to schedule queries for the appropriate system. In this paper, we evaluated the TPC-DS benchmark on a combination of query engines (Spark and Tez) and JVMs (J9 and OpenJDK). We found that using different engines lead to a drawback of over 10 times and that using different JVMs leads to a drawback of 3 times. We also analyzed the characteristics of each combination and then proposed classification models for selecting the best combination of systems with a generated query plan. As a result, we achieved a performance improvement of up to two times in total with the classifier.
机译:尽管Hadoop是事实上的标准大数据中间件,但已经在它之上开发了许多框架。由于有许多SQL-on-Hadoop系统可用,因此我们经常考虑哪种引擎最适合我们的查询。我们不仅可以选择查询引擎,还可以选择Java虚拟机(JVM)。但是,随着他们的系统变得越来越复杂,单个系统在任何时候都表现最佳并非总是如此。此外,不匹配的系统的性能可能会大大降低。为了利用最佳性能,重要的是要知道哪种查询类型适用于系统,然后为适当的系统安排查询。在本文中,我们结合查询引擎(Spark和Tez)和JVM(J9和OpenJDK)对TPC-DS基准进行了评估。我们发现使用不同的引擎导致的缺陷超过10倍,而使用不同的JVM导致的缺陷超过3倍。我们还分析了每种组合的特征,然后提出了分类模型,以选择具有生成的查询计划的系统的最佳组合。结果,我们通过分类器总共将性能提高了两倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号