首页> 外文会议>Data Engineering Workshops (ICDEW), 2010 >Towards better entity resolution techniques for Web document collections
【24h】

Towards better entity resolution techniques for Web document collections

机译:寻求用于Web文档收集的更好的实体解析技术

获取原文

摘要

As person names are non-unique, the same name on different Web pages might or might not refer to the same real-world person. This entity identification problem is one of the most challenging issues in realizing the Semantic Web or entity-oriented search. We address this disambiguation problem, which is very similar to the entity resolution problem studied in relational databases, however there are also several differences. Most importantly Web pages often only contain partial or incomplete information about the persons, moreover the available information is very heterogeneous, thus we are only able to obtain some uncertain evidence about whether two names refer to the same person using similarity functions. These similarity functions capture some aspects of the similarities between Web-pages, where the names occur, thus they perform very differently for the different names. We analyze some data engineering techniques to cope with the limited accuracy of the similarity functions and to combine multiple functions. Even with our simple techniques we could demonstrate systematic performance improvements and produce comparable results to state-of-the-art methods.
机译:由于人员名称不唯一,因此不同网页上的相同名称可能会或可能不会引用同一真实世界的人员。在实现语义Web或面向实体的搜索时,此实体标识问题是最具挑战性的问题之一。我们解决了该歧义消除问题,该问题与关系数据库中研究的实体解析问题非常相似,但是也存在一些差异。最重要的是,网页通常仅包含有关人员的部分或不完整信息,此外,可用信息非常不同,因此我们只能获得一些不确定的证据,以证明两个姓名是否使用相似性功能指向同一个人。这些相似性功能捕获了出现名称的网页之间相似性的某些方面,因此对于不同的名称,它们的执行方式有很大不同。我们分析了一些数据工程技术,以应对相似度函数的有限精度,并组合多个函数。即使使用简单的技术,我们也可以证明系统的性能改进,并可以产生与最新方法相当的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号