首页> 外文期刊>Knowledge and information systems >Clustering semantically heterogeneous distributed aggregate databases
【24h】

Clustering semantically heterogeneous distributed aggregate databases

机译:聚类语义异构的分布式聚合数据库

获取原文
获取原文并翻译 | 示例
           

摘要

Databases developed independently in a common open distributed environment may be heterogeneous with respect to both data schema and the embedded semantics. Managing schema and semantic heterogeneities brings considerable challenges to learning from distributed data and to support applications involving cooperation between different organisations. In this paper, we are concerned mainly with heterogeneous databases that hold aggregates on a set of attributes, which are often the result of materialised views of native large-scale distributed databases. A model-based clustering algorithm is proposed to construct a mixture model where each component corresponds to a cluster which is used to capture the contextual heterogeneity among databases from different populations. Schema heterogeneity, which can be recast as incomplete information, is handled within the clustering process using Expectation-Maximisation estimation and integration is carried out within a clustering iteration. Our proposed algorithm resolves the schema heterogeneity as part of the clustering process, thus avoiding transformation of the data into a unified schema. Results of algorithm evaluation on classification, scalability and reliability, using both real and synthetic data, demonstrate that our algorithm can achieve good performance by incorporating all of the information from available heterogeneous data. Our clustering approach has great potential for scalable knowledge discovery from semantically heterogeneous databases and for applications in an open distributed environment, such as the Semantic Web.
机译:在通用开放分布式环境中独立开发的数据库在数据模式和嵌入式语义方面可能是异构的。管理模式和语义异构性给从分布式数据中学习和支持涉及不同组织之间合作的应用程序带来了巨大挑战。在本文中,我们主要关注的是异构数据库,这些异构数据库将聚合保存在一组属性上,这些属性通常是本地大型分布式数据库的物化视图的结果。提出了一种基于模型的聚类算法来构建一个混合模型,其中每个组件对应一个聚类,用于捕获来自不同人群的数据库之间的上下文异质性。架构异质性(可以作为不完整的信息重铸)在群集过程中使用Expectation-Maximization估计进行处理,并且集成在群集迭代中执行。我们提出的算法解决了集群异构过程中架构异质性的问题,从而避免了将数据转换为统一的架构。使用真实数据和合成数据对算法进行分类,可扩展性和可靠性的算法评估结果表明,我们的算法可以通过合并来自可用异构数据的所有信息来达到良好的性能。我们的集群方法对于从语义异构数据库中进行可扩展的知识发现以及在开放式分布式环境(例如语义Web)中的应用程序具有巨大的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号