首页> 外文会议>International conference on very large data bases >TPC-DI: The First Industry Benchmark for Data Integration
【24h】

TPC-DI: The First Industry Benchmark for Data Integration

机译:TPC-DI:数据集成的第一行业基准

获取原文

摘要

Historically, the process of synchronizing a decision support system with data from operational systems has been referred to as Extract, Transform, Load (ETL) and the tools supporting such process have been referred to as ETL tools. Recently, ETL was replaced by the more comprehensive acronym, data integration (DI). DI describes the process of extracting and combining data from a variety of data source formats, transforming that data into a unified data model representation and loading it into a data store. This is done in the context of a variety of scenarios, such as data acquisition for business intelligence, analytics and data warehousing, but also synchronization of data between operational applications, data migrations and conversions, master data management, enterprise data sharing and delivery of data services in a service-oriented architecture context, amongst others. With these scenarios relying on up-to-date information it is critical to implement a highly performing, scalable and easy to maintain data integration system. This is especially important as the complexity, variety and volume of data is constantly increasing and performance of data integration systems is becoming very critical. Despite the significance of having a highly performing DI system, there has been no industry standard for measuring and comparing their performance. The TPC, acknowledging this void, has released TPC-DI, an innovative benchmark for data integration. This paper motivates the reasons behind its development, describes its main characteristics including workload, run rules, metric, and explains key decisions.
机译:从历史上看,将决策支持系统与来自操作系统的数据同步的过程已被称为提取,转换,负载(ETL)和支持此类进程的工具已被称为ETL工具。最近,ETL被更全面的首字母缩略词,数据集成(DI)取代。 DI描述了从各种数据源格式中提取和组合数据的过程,将该数据转换为统一数据模型表示并将其加载到数据存储中。这是在各种场景的上下文中完成的,例如商业智能,分析和数据仓库的数据采集,还可以在操作应用程序,数据迁移和转换之间的数据同步,主数据管理,企业数据共享和数据交付之间的数据在面向服务的架构上下文中的服务,其中包括其他。使用这些方案依赖于最新信息,实现高度性能,可扩展且易于维护的数据集成系统至关重要。这与复杂性尤其重要,数据的多样性和数量不断增加,数据集成系统的性能变得非常关键。尽管具有高度表现的DI系统具有重要意义,但没有用于测量和比较其性能的行业标准。 TPC,确认这一空隙,已释放TPC-DI,这是一个用于数据集成的创新基准。本文激励其开发背后的原因,描述了其主要特征,包括工作量,运行规则,指标,并解释关键决策。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号