首页> 外文期刊>Computer science >Towards task-based parallelization for entity resolution
【24h】

Towards task-based parallelization for entity resolution

机译:朝着实体分辨率的任务并行化

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Entity resolution (ER) refers to the problem of finding which virtual representations in one or more data sources refer to the same real-world entity. A central question in ER is how to find matching entity representations (so called duplicates) efficiently and in a scalable way. One general technique to address these issues is to leverage parallelization. In particular, almost all work on parallel ER focus on data parallelism. This paper focuses on task parallelism for ER. This type of parallelism allows to support incremental ER that offers incremental computation of the solution by streaming results of intermediate stages of ER as soon as they are computed. This possibly allows to obtain results in a more timely fashion and can also serve in a service-oriented setting with limited time or monetary budget. In summary, this paper presents a framework for task-parallelization of ER, supporting in particular ER of large amounts of semi-structured and heterogeneous data. We also discuss a possible implementation of our framework.
机译:实体分辨率(ER)是指找到一个或多个数据源中的虚拟表示的问题,引用相同的真实实体。 ER中的核心问题是如何高效地找到匹配的实体表示(如此称为重复项)。解决这些问题的一种通用技术是利用并行化。特别是,几乎所有工作都在并行ER的关注数据并行。本文重点介绍了ER的任务并行性。这种类型的并行性允许支持增量ER,它通过在计算时立即通过ER的中间阶段的结果流传输来提供求助的增量计算。这可能允许以更及时的方式获得结果,并且还可以以有限的时间或货币预算为导向的设定。总之,本文介绍了ER的任务并行化的框架,特定于大量半结构化和异构数据的支持。我们还讨论了我们框架的可能实施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号