首页> 外文期刊>International Journal of Applied Mathematics and Computer Science >PARALLELIZING USER-DEFINED FUNCTIONS IN THE ETL WORKFLOW USING ORCHESTRATION STYLE SHEETS
【24h】

PARALLELIZING USER-DEFINED FUNCTIONS IN THE ETL WORKFLOW USING ORCHESTRATION STYLE SHEETS

机译:使用编排样式表在ETL工作流中分配用户定义的功能

获取原文
获取原文并翻译 | 示例
           

摘要

Today's ETL tools provide capabilities to develop custom code as user-defined functions (UDFs) to extend the expressiveness of the standard ETL operators. However, while this allows us to easily add new functionalities, it also comes with the risk that the custom code is not intended to be optimized, e.g., by parallelism, and for this reason, it performs poorly for data-intensive ETL workflows. In this paper we present a novel framework, which allows the ETL developer to choose a design pattern in order to write parallelizable code and generates a configuration for the UDFs to be executed in a distributed environment. This enables ETL developers with minimum expertise in distributed and parallel computing to develop UDFs without taking care of parallelization configurations and complexities. We perform experiments on large-scale datasets based on TPC-DS and BigBench. The results show that our approach significantly reduces the effort of ETL developers and at the same time generates efficient parallel configurations to support complex and data-intensive ETL tasks.
机译:当今的ETL工具提供了将自定义代码开发为用户定义函数(UDF)的功能,以扩展标准ETL运算符的表达能力。但是,虽然这使我们可以轻松添加新功能,但同时也带来了不希望例如通过并行性来优化自定义代码的风险,因此,它对于数据密集型ETL工作流程的性能不佳。在本文中,我们提出了一个新颖的框架,该框架允许ETL开发人员选择设计模式以编写可并行化的代码,并为要在分布式环境中执行的UDF生成配置。这使ETL开发人员在分布式和并行计算方面的专业知识最少,可以开发UDF,而无需考虑并行化配置和复杂性。我们在基于TPC-DS和BigBench的大规模数据集上进行实验。结果表明,我们的方法大大减少了ETL开发人员的工作量,同时生成了有效的并行配置,以支持复杂且数据密集的ETL任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号