首页> 外文会议>International Conference on Computational Scinece and Its Applications >Clustering-Based Schema Matching of Web Data for Constructing Digital Library
【24h】

Clustering-Based Schema Matching of Web Data for Constructing Digital Library

机译:基于聚类的基于Web数据构造数字库的模式匹配

获取原文

摘要

The abundant information on the web attracts many researches on reusing the valuable web data in other information applications, for example, digital libraries. Web information published by various contributors in different ways, schema matching is a basic problem for the heterogeneous data sources integration. Web information integration arises new challenges from the following ways: web data are short of intact schema definition; and the schema matching between web data can not be simplified as 1-1 mapping problem. In this paper we propose an algorithm, COSM, to automatic the web data schema matching process. The matching process is transformed into a clustering problem: the data elements clustered into one cluster are viewed as mapping ones. COSM is mainly instance-level matching approach, also combined with a partial name matcher in calculating the elements distance metrics. A pretreatment for data is carried out to give rational distance metrics between elements before clustering step. The experiment of algorithm testing and application (applied in the Chinese folk music digital library construction) proves the algorithm's efficiency.
机译:关于Web的丰富信息吸引了许多关于在其他信息应用中重用有价值的Web数据的研究,例如数字图书馆。通过不同方式发布的Web信息,模式匹配是异构数据源集成的基本问题。 Web信息集成从以下方式中产生新的挑战:Web数据缺乏完整的模式定义;并且Web数据之间的匹配模式不能被简化为1-1映射问题。在本文中,我们提出了一种算法,COSM,自动自动Web数据模式匹配过程。将匹配过程转换为群集问题:将群集成一个群集的数据元素被视为映射。 COSM主要是实例级匹配方法,同时还与局部名称匹配器组合计算元素距离指标。进行数据的预处理,以在聚类步骤之前提供元件之间的合理距离度量。算法测试和应用的实验(应用于中国民间音乐数字图书馆建设)证明了算法的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号