Clustering semantically heterogeneous distributed aggregate databases

Shuai Zhang; Sally I. McClean; Bryan W. Scotney

首页> 外文期刊>Knowledge and information systems >Clustering semantically heterogeneous distributed aggregate databases

【24h】

Clustering semantically heterogeneous distributed aggregate databases

机译：聚类语义异构的分布式聚合数据库

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Databases developed independently in a common open distributed environment may be heterogeneous with respect to both data schema and the embedded semantics. Managing schema and semantic heterogeneities brings considerable challenges to learning from distributed data and to support applications involving cooperation between different organisations. In this paper, we are concerned mainly with heterogeneous databases that hold aggregates on a set of attributes, which are often the result of materialised views of native large-scale distributed databases. A model-based clustering algorithm is proposed to construct a mixture model where each component corresponds to a cluster which is used to capture the contextual heterogeneity among databases from different populations. Schema heterogeneity, which can be recast as incomplete information, is handled within the clustering process using Expectation-Maximisation estimation and integration is carried out within a clustering iteration. Our proposed algorithm resolves the schema heterogeneity as part of the clustering process, thus avoiding transformation of the data into a unified schema. Results of algorithm evaluation on classification, scalability and reliability, using both real and synthetic data, demonstrate that our algorithm can achieve good performance by incorporating all of the information from available heterogeneous data. Our clustering approach has great potential for scalable knowledge discovery from semantically heterogeneous databases and for applications in an open distributed environment, such as the Semantic Web.

机译：在通用开放分布式环境中独立开发的数据库在数据模式和嵌入式语义方面可能是异构的。管理模式和语义异构性给从分布式数据中学习和支持涉及不同组织之间合作的应用程序带来了巨大挑战。在本文中，我们主要关注的是异构数据库，这些异构数据库将聚合保存在一组属性上，这些属性通常是本地大型分布式数据库的物化视图的结果。提出了一种基于模型的聚类算法来构建一个混合模型，其中每个组件对应一个聚类，用于捕获来自不同人群的数据库之间的上下文异质性。架构异质性（可以作为不完整的信息重铸）在群集过程中使用Expectation-Maximization估计进行处理，并且集成在群集迭代中执行。我们提出的算法解决了集群异构过程中架构异质性的问题，从而避免了将数据转换为统一的架构。使用真实数据和合成数据对算法进行分类，可扩展性和可靠性的算法评估结果表明，我们的算法可以通过合并来自可用异构数据的所有信息来达到良好的性能。我们的集群方法对于从语义异构数据库中进行可扩展的知识发现以及在开放式分布式环境（例如语义Web）中的应用程序具有巨大的潜力。

著录项

来源
《Knowledge and information systems》 |2014年第2期|共34页
作者
Shuai Zhang; Sally I. McClean; Bryan W. Scotney;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统理论;
关键词
Model-based clustering; Semantically heterogeneous databases; EM algorithm; Unsupervised learning;

机译：基于模型的聚类;语义异构数据库;EM算法;无监督学习;

相似文献

外文文献
中文文献
专利

1. Clustering semantically heterogeneous distributed aggregate databases [J] . Shuai Zhang, Sally I. McClean, Bryan W. Scotney Knowledge and information systems . 2014,第2期

机译：聚类语义异构的分布式聚合数据库
2. Integrating Different Semantics of Classification Levels in Heterogeneous Distributed Database System [J] . Min-Shiang Hwang, Wel-Pang Yangz Journal of Applied Sciences . 2002,第5期

机译：在异构分布式数据库系统中集成不同分类级别的语义
3. A scalable approach to integrating heterogeneous aggregate views of distributed databases [J] . McClean S., Scotney B., Greer K. IEEE Transactions on Knowledge and Data Engineering . 2003,第1期

机译：集成分布式数据库的异构聚合视图的可扩展方法
4. Knowledge Discovery from Semantically Heterogeneous Aggregate Databases Using Model-Based Clustering [C] . Shuai Zhang, Sally McClean, Bryan Scotney British National Conference on Databases . 2007

机译：从语义异构聚合数据库使用基于模型的群集的知识发现
5. Integration of Heterogeneous Data for Protein Ontology Database Using Semantic Web Technology [D] . Li, Xiang 2018

机译：使用语义Web技术集成蛋白质本体数据库的异构数据
6. Semalytics: a semantic analytics platform for the exploration of distributed and heterogeneous cancer data in translational research [O] . Andrea Mignone, Alberto Grand, Alessandro Fiori, 2019

机译：Semalytics：一个语义分析平台用于在翻译研究中探索分布式和异构癌症数据
7. USING SEMANTIC WEB TECHNOLOGIES IN HETEROGENEOUS DISTRIBUTED DATABASE SYSTEM: A CASE STUDY FOR MANAGING ENERGY DATA ON MOBILE DEVICES [O] . Zhan Liu, Anne Le Calvé, Fabian Cretton, 2014

机译：在异构分布式数据库系统中使用语义Web技术：管理移动设备能量数据的案例研究

Clustering semantically heterogeneous distributed aggregate databases

摘要

著录项

相似文献

相关主题

期刊订阅