首页> 外文会议>Proceedings of the seventh Americas conference on information systems >CLUSTERING DATABASE OBJECTS FOR SEMANTIC INTEGRATION OF HETEROGENEOUS DATABASES
【24h】

CLUSTERING DATABASE OBJECTS FOR SEMANTIC INTEGRATION OF HETEROGENEOUS DATABASES

机译:异构数据库的语义集成的集群数据库对象

获取原文
获取原文并翻译 | 示例

摘要

Interschema Relationship Identification (IRI), i.e., determining the relationships between objects inrnheterogeneous database schemas, is critical to both the classical schema integration problem and the datarncleansing and consolidation phase that precedes data warehouse development. In this paper we propose arncluster analysis-based approach to semi-automate the IRI process, which is typically very time-consuming andrnrequires extensive human interaction. We apply multiple clustering techniques, including K-means, hierarchicalrnclustering, and Self-Organizing Map (SOM), to identify similar database objects from heterogeneous databasesrnbased on a combination of features such as object names, documentation, schematic information, data contents,rnand usage patterns. Initial experimental results indicate that our approach performs better than existingrnapproaches in the accuracy of identified interschema relationships. In addition, a prototype system we haverndeveloped provides users a visualization tool for the display of clustering results as well as for the incrementalrnevaluation of candidate solutions.
机译:模式间关系识别(IRI),即确定异构数据库模式中对象之间的关系,对于经典模式集成问题以及数据仓库开发之前的数据清理和合并阶段都是至关重要的。在本文中,我们提出了基于arncluster分析的方法来半自动化IRI过程,这通常非常耗时并且需要大量的人机交互。我们应用多种聚类技术,包括K-means,分层聚类和自组织映射(SOM),以基于对象名称,文档,示意图信息,数据内容,用法和用法等特征的组合从异构数据库中识别相似的数据库对象。模式。初步的实验结果表明,我们的方法在确定的模式间关系的准确性上比现有方法更好。此外,我们开发的原型系统为用户提供了可视化工具,用于显示聚类结果以及候选解决方案的增量评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号