【24h】

Performance Tuning in Distributed Processing of ETL

机译:ETL分布式处理中的性能调整

获取原文

摘要

Extract, transform, and load (ETL) is a very common and important technology for building data warehouse includes business intelligence. When people issue a very complex SQL query to acquit data from a transaction system into a data warehouse, it involves many procedures including table-joining, sort, and aggregation. Such procedures require significant retrieving step and huge data transferring from tables. The intensive querying very often causes performance issues to be concerned. Moreover, it commonly generates negative impacts on data instance resources. How to improve the performance for ETL becomes critical and challenging. This paper presents a parallel processing solution that splitting big and complex SQL query into small pieces in distributed computing manor. The proposed method aims at reducing cost of computation, while ensuring data integrity among joined tables. The innovative idea can be verified through selected test-beds of performance tuning.
机译:提取,变换和加载(ETL)是建筑数据仓库的一个非常常见和重要的技术,包括商业智能。当人们发出非常复杂的SQL查询以将数据从事务系统获取到数据仓库中时,它涉及许多过程,包括表加入,排序和聚合。此类过程需要从表中重新检索步骤和大量数据。密集查询通常会导致绩效问题才能致力于关注。此外,它通常为数据实例资源产生负面影响。如何提高ETL的表现变得至关重要和具有挑战性。本文介绍了一个并行处理解决方案,将大型和复杂的SQL查询分成分布式计算庄园的小块。该方法的目的旨在降低计算成本,同时确保连接表之间的数据完整性。可以通过选定的性能调整床进行验证创新理念。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号