首页> 外文会议>IEEE International Conference on Data Engineering >SQL-SA for big data discovery polymorphic and parallelizable SQL user-defined scalar and aggregate infrastructure in Teradata Aster 6.20
【24h】

SQL-SA for big data discovery polymorphic and parallelizable SQL user-defined scalar and aggregate infrastructure in Teradata Aster 6.20

机译:用于Teradata Aster 6.20中的大数据发现多态和可并行化SQL用户定义的标量和聚合基础结构的SQL-SA

获取原文

摘要

There is increasing demand to integrate big data analytic systems using SQL. Given the vast ecosystem of SQL applications, enabling SQL capabilities allows big data platforms to expose their analytic potential to a wide variety of end users, accelerating discovery processes and providing significant business value. Most existing big data frameworks are based on one particular programming model such as MapReduce or Graph. However, data scientists are often forced to manually create adhoc data pipelines to connect various big data tools and platforms to serve their analytic needs. When the analytic tasks change, these data pipelines may be costly to modify and maintain. In this paper we present SQL-SA, a polymorphic and parallelizable SQL scalar and aggregate infrastructure in Aster 6.20. This infrastructure extends Aster 6's MapReduce and Graph capabilities to support polymorphic user-defined scalar and aggregate functions using flexible SQL syntax. The implementation enhances main Aster components including query syntax, API, planning and execution extensively. Integrating these new user-defined scalar and aggregate functions with Aster MapReduce and Graph functions, Aster 6.20 enables data scientists to integrate diverse programming models in a single SQL statement. The statement is automatically converted to an optimal data pipeline and executed in parallel. Using a real world business problem and data, Aster 6.20 demonstrates a significant performance advantage (25%+) over Hadoop Pig and Hive.
机译:越来越需要使用SQL集成大数据分析系统。在庞大的SQL应用程序生态系统中,启用SQL功能可以使大数据平台将其分析潜力暴露给各种最终用户,从而加速发现过程并提供可观的业务价值。现有的大多数大数据框架都基于一种特定的编程模型,例如MapReduce或Graph。但是,数据科学家经常被迫手动创建临时数据管道,以连接各种大数据工具和平台来满足他们的分析需求。当分析任务更改时,这些数据管道的修改和维护成本可能很高。在本文中,我们介绍了SQL-SA,这是Aster 6.20中的一种多态且可并行化的SQL标量和聚合基础结构。该基础结构扩展了Aster 6的MapReduce和Graph功能,以使用灵活的SQL语法支持多态用户定义的标量和聚合函数。该实现广泛增强了Aster的主要组件,包括查询语法,API,计划和执行。通过将这些新的用户定义的标量和聚合函数与Aster MapReduce和Graph函数集成在一起,Aster 6.20使数据科学家能够在单个SQL语句中集成各种编程模型。该语句将自动转换为最佳数据管道并并行执行。通过使用现实世界中的业务问题和数据,Aster 6.20展示了比Hadoop Pig和Hive显着的性能优势(超过25%)。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号