Knowledge Discovery from Semantically Heterogeneous Aggregate Databases Using Model-Based Clustering

机译：使用基于模型的聚类从语义异构聚合数据库中发现知识

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

When distributed databases are developed independently, they may be semantically heterogeneous with respect to data granularity, scheme information and the embedded semantics. However, most traditional distributed knowledge discovery (DKD) methods assume that the distributed databases derive from a single virtual global table, where they share the same semantics and data structures. This data heterogeneity and the underlying semantics bring a considerable challenge for DKD. In this paper, we propose a model-based clustering method for aggregate databases, where the heterogeneous schema structure is due to the heterogeneous classification schema. The underlying semantics can be captured by different clusters. The clustering is carried out via a mixture model, where each component of the mixture corresponds to a different virtual global table. An advantage of our approach is that the algorithm resolves the heterogeneity as part of the clustering process without previously having to homogenise the heterogeneous local schema to a shared schema. Evaluation of the algorithm is carried out using both real and synthetic data. Scalability of the algorithm is tested against the number of databases to be clustered; the number of clusters; and the size of the databases. The relationship between performance and complexity is also evaluated. Our experiments show that this approach has good potential for scalable integration of semantically heterogeneous databases.

机译：当独立开发分布式数据库时，就数据粒度，方案信息和嵌入式语义而言，它们在语义上可能是异构的。但是，大多数传统的分布式知识发现（DKD）方法都假定分布式数据库是从单个虚拟全局表派生的，在这些虚拟全局表中它们共享相同的语义和数据结构。这种数据异质性和底层语义给DKD带来了巨大挑战。在本文中，我们提出了一种基于模型的聚合数据库聚类方法，其中异构架构结构归因于异构分类架构。底层语义可以被不同的集群捕获。通过混合模型执行聚类，其中混合的每个组成部分对应于不同的虚拟全局表。我们的方法的优势在于，该算法可解决异质性问题，这是聚类过程的一部分，而无需事先将异质性本地模式均化为共享模式。使用真实数据和合成数据对算法进行评估。针对要集群的数据库数量测试了算法的可伸缩性；集群数量；以及数据库的大小。还评估了性能和复杂性之间的关系。我们的实验表明，这种方法对于语义异构数据库的可伸缩集成具有良好的潜力。

著录项

来源
《British National Conference on Databases(BNCOD 24); 20070703-05; Glasgow(GB)》|2007年|P.190-202|共13页
会议地点 Glasgow(GB)
作者
Shuai Zhang; Sally McClean; Bryan Scotney;
展开▼
作者单位

School of Computing and Information Engineering, University of Ulster, Coleraine, Northern Ireland, UK;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.13;
关键词
model-based clustering; semantically heterogeneous databases; EM algorithm;

机译：基于模型的聚类；语义异构数据库； EM算法;

相似文献

外文文献
中文文献
专利

1. Clustering semantically heterogeneous distributed aggregate databases [J] . Shuai Zhang, Sally I. McClean, Bryan W. Scotney Knowledge and information systems . 2014,第2期

机译：聚类语义异构的分布式聚合数据库
2. Integrating Semantically Heterogeneous Aggregate Views Of Distributed Databases [J] . Sally McClean, Bryan Scotney, Philip Morrow, Distributed and Parallel Databases . 2008,第1a3期

机译：集成分布式数据库的语义异构聚合视图
3. Integration and Querying of Heterogeneous Omics Semantic Annotations for Biomedical and Biomolecular Knowledge Discovery [J] . Irshad Omer, Khan Muhammad Usman Ghani Current Bioinformatics . 2020,第1期

机译：生物医学与生物分子知识发现的异构OMICS语义注解的集成与查询
4. Knowledge Discovery from Semantically Heterogeneous Aggregate Databases Using Model-Based Clustering [C] . Shuai Zhang, Sally McClean, Bryan Scotney British National Conference on Databases . 2007

机译：从语义异构聚合数据库使用基于模型的群集的知识发现
5. Discovery of characteristic knowledge in databases using cluster analysis and genetic programming [D] . Ryu, Tae-wan 1998

机译：使用聚类分析和遗传编程在数据库中发现特征知识
6. Semantic Health Knowledge Graph: Semantic Integration of Heterogeneous Medical Knowledge and Services [O] . Longxiang Shi, Shijian Li, Xiaoran Yang, 2006

机译：语义健康知识图：异构医学知识和服务的语义集成
7. Utilizing integrity constraint knowledge in heterogeneous databases: A methodology for schema integration and semantic query processing. [O] . Venkataraman Ramesh. 1995

机译：在异构数据库中利用完整性约束知识：一种用于模式集成和语义查询处理的方法。

Knowledge Discovery from Semantically Heterogeneous Aggregate Databases Using Model-Based Clustering

摘要

著录项

相似文献

相关主题

期刊订阅