首页> 外文会议>2011 IEEE 27th International Conference on data Engineering Workshops >Efficient entity resolution methods for heterogeneous information spaces
【24h】

Efficient entity resolution methods for heterogeneous information spaces

机译:异构信息空间的有效实体解析方法

获取原文

摘要

The Web of Data encompasses a voluminous, yet constantly expanding collection of structured and semi-structured data sets. An important prerequisite for leveraging on them is the detection (and merge) of information that describe the same real-world entities, a task known as Entity Resolution. To enhance the efficiency of this quadratic task, blocking techniques are typically employed. They are, however, inapplicable to the Web of Data, due to the noise, the loose schema binding as well as the unprecedented heterogeneity inherent in it. In the context of my thesis, I focus on developing novel blocking methods that scale up Entity Resolution within such large, noisy, and heterogeneous information spaces. At their core lies an attribute-agnostic mechanism that relies exclusively on the values of entity profiles in order to build blocks effectively. The resulting set of blocks is processed efficiently by intelligent techniques that minimize the required number of comparisons. Any combination of block building and block processing methods is possible, allowing for high flexibility of the overall approach. Initial experimental studies on large, real-world data sets have produced quite promising results.
机译:数据网包含大量但仍在不断扩展的结构化和半结构化数据集的集合。利用它们的一个重要先决条件是检测(和合并)描述相同真实世界实体的信息,这项任务称为“实体解析”。为了提高该二次任务的效率,通常采用阻塞技术。但是,由于噪声,松散的模式绑定以及其固有的前所未有的异质性,它们不适用于数据网络。在我的论文中,我专注于开发新颖的阻止方法,以在如此大,嘈杂且异构的信息空间内扩大实体分辨率。它们的核心是属性不可知的机制,该机制仅依赖于实体概要文件的值才能有效地构建块。通过最小化所需比较次数的智能技术,可以有效地处理生成的一组块。块构建和块处理方法的任何组合都是可能的,从而使整个方法具有很高的灵活性。对大型,真实世界数据集的初步实验研究已产生了颇有希望的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号