...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >A scalable approach to integrating heterogeneous aggregate views of distributed databases
【24h】

A scalable approach to integrating heterogeneous aggregate views of distributed databases

机译:集成分布式数据库的异构聚合视图的可扩展方法

获取原文
获取原文并翻译 | 示例
           

摘要

Aggregate views are commonly used for summarizing information held in very large databases such as those encountered in data warehousing, large scale transaction management, and statistical databases. Such applications often involve distributed databases that have developed independently and therefore may exhibit incompatibility, heterogeneity, and data inconsistency. We are here concerned with the integration of aggregates that have heterogeneous classification schemes where local ontologies, in the form of such classification schemes, may be mapped onto a common ontology. In previous work, we have developed a method for the integration of such aggregates; the method previously developed is efficient, but cannot handle innate data inconsistencies that are likely to arise when a large number of databases are being integrated. In this paper, we develop an approach that can handle data inconsistencies and is thus inherently much more scalable. In our new approach, we first construct a dynamic shared ontology by analyzing the correspondence graph that relates the heterogeneous classification schemes; the aggregates are then derived by minimization of the Kullback-Leibler information divergence using the EM (Expectation-Maximization) algorithm. Thus, we may assess whether global queries on such aggregates are answerable, partially answerable, or unanswerable in advance of computing the aggregates themselves.
机译:聚合视图通常用于汇总存储在超大型数据库中的信息,例如数据仓库,大规模事务管理和统计数据库中遇到的信息。此类应用程序通常涉及独立开发的分布式数据库,因此可能表现出不兼容性,异构性和数据不一致。我们在这里关注的是具有异构分类方案的集合体的集成,其中以这种分类方案的形式将本地本体映射到一个通用本体上。在以前的工作中,我们已经开发了一种集成这些聚集体的方法。先前开发的方法是有效的,但不能处理集成大量数据库时可能出现的固有数据不一致问题。在本文中,我们开发了一种可以处理数据不一致的方法,因此具有内在的可扩展性。在我们的新方法中,我们首先通过分析与异构分类方案相关的对应图来构建动态共享本体。然后,使用EM(期望最大化)算法通过最小化Kullback-Leibler信息散度来导出聚合。因此,在计算聚合本身之前,我们可以评估对此类聚合的全局查询是可回答,部分可回答还是不可回答。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号