首页> 外文会议>International conference on very large data bases >TPC-DI: The First Industry Benchmark for Data Integration
【24h】

TPC-DI: The First Industry Benchmark for Data Integration

机译:TPC-DI:数据集成的第一个行业基准

获取原文

摘要

Historically, the process of synchronizing a decision support system with data from operational systems has been referred to as Extract, Transform, Load (ETL) and the tools supporting such process have been referred to as ETL tools. Recently, ETL was replaced by the more comprehensive acronym, data integration (DI). DI describes the process of extracting and combining data from a variety of data source formats, transforming that data into a unified data model representation and loading it into a data store. This is done in the context of a variety of scenarios, such as data acquisition for business intelligence, analytics and data warehousing, but also synchronization of data between operational applications, data migrations and conversions, master data management, enterprise data sharing and delivery of data services in a service-oriented architecture context, amongst others. With these scenarios relying on up-to-date information it is critical to implement a highly performing, scalable and easy to maintain data integration system. This is especially important as the complexity, variety and volume of data is constantly increasing and performance of data integration systems is becoming very critical. Despite the significance of having a highly performing DI system, there has been no industry standard for measuring and comparing their performance. The TPC, acknowledging this void, has released TPC-DI, an innovative benchmark for data integration. This paper motivates the reasons behind its development, describes its main characteristics including workload, run rules, metric, and explains key decisions.
机译:历史上,将决策支持系统与来自操作系统的数据同步的过程称为“提取,转换,加载(ETL)”,而将支持该过程的工具称为ETL工具。最近,ETL被更全面的首字母缩写,数据集成(DI)所取代。 DI描述了从多种数据源格式提取和组合数据,将该数据转换为统一的数据模型表示并将其加载到数据存储中的过程。这是在各种情况下完成的,例如用于商业智能,分析和数据仓库的数据获取,还包括运营应用程序之间的数据同步,数据迁移和转换,主数据管理,企业数据共享和数据交付。在面向服务的体系结构上下文中的服务,等等。在这些方案依赖最新信息的情况下,实现高性能,可伸缩且易于维护的数据集成系统至关重要。这一点尤其重要,因为数据的复杂性,种类和数量不断增加,并且数据集成系统的性能变得非常关键。尽管拥有高性能的DI系统非常重要,但还没有用于衡量和比较其性能的行业标准。 TPC意识到了这一空白,发布了TPC-DI,这是一种用于数据集成的创新基准。本文探讨了其发展背后的原因,描述了其主要特征,包括工作量,运行规则,度量标准,并解释了关键决策。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号