首页> 外文会议>The semantic web: research and applications. >Unsupervised Learning of Link Discovery Configuration
【24h】

Unsupervised Learning of Link Discovery Configuration

机译:链接发现配置的无监督学习

获取原文
获取原文并翻译 | 示例

摘要

Discovering links between overlapping datasets on the Web is generally realised through the use of fuzzy similarity measures. Configuring such measures is often a non-trivial task that depends on the domain, ontological schemas, and formatting conventions in data. Existing solutions either rely on the user's knowledge of the data and the domain or on the use of machine learning to discover these parameters based on training data. In this paper, we present a novel approach to tackle the issue of data linking which relies on the unsupervised discovery of the required similarity parameters. Instead of using labeled data, the method takes into account several desired properties which the distribution of output similarity values should satisfy. The method includes these features into a fitness criterion used in a genetic algorithm to establish similarity parameters that maximise the quality of the resulting linkset according to the considered properties. We show in experiments using benchmarks as well as real-world datasets that such an unsupervised method can reach the same levels of performance as manually engineered methods, and how the different parameters of the genetic algorithm and the fitness criterion affect the results for different datasets.
机译:通常,通过使用模糊相似性度量来发现Web上重叠的数据集之间的链接。配置此类措施通常是一项艰巨的任务,它取决于域,本体架构和数据中的格式约定。现有解决方案要么依赖于用户对数据和域的了解,要么依赖于使用机器学习根据训练数据发现这些参数。在本文中,我们提出了一种新颖的方法来解决数据链接问题,该方法依赖于所需相似性参数的无监督发现。代替使用标记的数据,该方法考虑了输出相似性值的分布应满足的几个期望的特性。该方法将这些特征包括在适合度标准中,该适合度标准在遗传算法中用于建立相似性参数,以根据考虑的属性来最大化所得链接集的质量。我们在使用基准测试和真实数据集的实验中表明,这种无监督的方法可以达到与手动设计方法相同的性能水平,以及遗传算法的不同参数和适用性标准如何影响不同数据集的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号