首页> 外文会议>British National Conference on Databases(BNCOD 24); 20070703-05; Glasgow(GB) >Knowledge Discovery from Semantically Heterogeneous Aggregate Databases Using Model-Based Clustering
【24h】

Knowledge Discovery from Semantically Heterogeneous Aggregate Databases Using Model-Based Clustering

机译:使用基于模型的聚类从语义异构聚合数据库中发现知识

获取原文
获取原文并翻译 | 示例

摘要

When distributed databases are developed independently, they may be semantically heterogeneous with respect to data granularity, scheme information and the embedded semantics. However, most traditional distributed knowledge discovery (DKD) methods assume that the distributed databases derive from a single virtual global table, where they share the same semantics and data structures. This data heterogeneity and the underlying semantics bring a considerable challenge for DKD. In this paper, we propose a model-based clustering method for aggregate databases, where the heterogeneous schema structure is due to the heterogeneous classification schema. The underlying semantics can be captured by different clusters. The clustering is carried out via a mixture model, where each component of the mixture corresponds to a different virtual global table. An advantage of our approach is that the algorithm resolves the heterogeneity as part of the clustering process without previously having to homogenise the heterogeneous local schema to a shared schema. Evaluation of the algorithm is carried out using both real and synthetic data. Scalability of the algorithm is tested against the number of databases to be clustered; the number of clusters; and the size of the databases. The relationship between performance and complexity is also evaluated. Our experiments show that this approach has good potential for scalable integration of semantically heterogeneous databases.
机译:当独立开发分布式数据库时,就数据粒度,方案信息和嵌入式语义而言,它们在语义上可能是异构的。但是,大多数传统的分布式知识发现(DKD)方法都假定分布式数据库是从单个虚拟全局表派生的,在这些虚拟全局表中它们共享相同的语义和数据结构。这种数据异质性和底层语义给DKD带来了巨大挑战。在本文中,我们提出了一种基于模型的聚合数据库聚类方法,其中异构架构结构归因于异构分类架构。底层语义可以被不同的集群捕获。通过混合模型执行聚类,其中混合的每个组成部分对应于不同的虚拟全局表。我们的方法的优势在于,该算法可解决异质性问题,这是聚类过程的一部分,而无需事先将异质性本地模式均化为共享模式。使用真实数据和合成数据对算法进行评估。针对要集群的数据库数量测试了算法的可伸缩性;集群数量;以及数据库的大小。还评估了性能和复杂性之间的关系。我们的实验表明,这种方法对于语义异构数据库的可伸缩集成具有良好的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号