首页> 外文会议>International conference on database systems for advanced applications >PBA: Partition and Blocking Based Alignment for Large Knowledge Bases
【24h】

PBA: Partition and Blocking Based Alignment for Large Knowledge Bases

机译:PBA:大型知识库的基于分区和块的对齐

获取原文

摘要

The vigorous development of semantic web has enabled the creation of a growing number of large-scale knowledge bases across various domains. As different knowledge-bases contain overlapping and complementary information, automatically integrating these knowledge bases by aligning their classes and instances can improve the quality and coverage of the knowledge bases. Existing knowledge-base alignment algorithms have some limitations: (1) not scalable, (2) poor quality, (3) not fully automatic. To address these limitations, we develop a scalable partition-and-blocking based alignment framework, named Pba, which can automatically align knowledge bases with tens of millions of instances efficiently. Pba contains three steps. (1) Partition: we propose a new hierarchical agglomerative co-clustering algorithm to partition the class hierarchy of the knowledge base into multiple class partitions. (2) Blocking: we judiciously divide the instances in the same class partition into small blocks to further improve the performance. (3) Alignment: we compute the similarity of the instances in each block using a vector space model and align the instances with large similarities. Experimental results on real and synthetic datasets show that our algorithm significantly outperforms state-of-art approaches in efficiency, even by an order of magnitude, while keeping high alignment quality.
机译:语义网的蓬勃发展已使得跨不同领域的越来越多的大规模知识库的创建成为可能。由于不同的知识库包含重叠和互补的信息,因此通过对齐类别和实例来自动集成这些知识库可以提高知识库的质量和覆盖范围。现有的知识库对齐算法具有一些局限性:(1)无法扩展,(2)质量低劣,(3)并非全自动。为了解决这些限制,我们开发了一个可扩展的基于分区和块的对齐框架,称为Pba,该框架可以自动有效地对齐数千万个实例的知识库。 Pba包含三个步骤。 (1)分区:我们提出了一种新的层次化聚集共聚算法,将知识库的类层次结构划分为多个类分区。 (2)阻塞:我们明智地将同一类分区中的实例划分为小块,以进一步提高性能。 (3)对齐:我们使用向量空间模型计算每个块中实例的相似度,并以较大的相似度对齐实例。在真实数据集和合成数据集上的实验结果表明,我们的算法在保持高对准质量的同时,效率甚至明显优于最新方法,甚至高出一个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号