首页> 外文会议>International conference on very large data bases >Question Selection for Crowd Entity Resolution
【24h】

Question Selection for Crowd Entity Resolution

机译:人群实体分辨率的问题选择

获取原文

摘要

We study the problem of enhancing Entity Resolution (ER) with the help of crowdsourcing. ER is the problem of clustering records that refer to the same real-world entity and can be an extremely difficult process for computer algorithms alone. For example, figuring out which images refer to the same person can be a hard task for computers, but an easy one for humans. We study the problem of resolving records with crowdsourcing where we ask questions to humans in order to guide ER into producing accurate results. Since human work is costly, our goal is to ask as few questions as possible. We propose a probabilistic framework for ER that can be used to estimate how much ER accuracy we obtain by asking each question and select the best question with the highest expected accuracy. Computing the expected accuracy is #P-hard, so we propose approximation techniques for efficient computation. We evaluate our best question algorithms on real and synthetic datasets and demonstrate how we can obtain high ER accuracy while significantly reducing the number of questions asked to humans.
机译:我们在众包的帮助下研究了加强实体解析(ER)的问题。呃是群集记录的问题,它引用相同的真实实体,并且可以是单独的计算机算法的极其困难的过程。例如,弄清楚哪些图像参考同一个人可以是计算机的艰难任务,但对于人类来说是一个容易的人。我们研究了用众包解决记录的问题,在那里我们向人类提出问题,以指导ER产生准确的结果。由于人类的工作成本高,我们的目标是尽可能少的问题。我们向ER提出了一个概率框架,可以用于估计我们通过询问每个问题获得的易于准确性,并选择最高的预期准确性的最佳问题。计算预期的准确性是#P-HARD,因此我们提出了用于有效计算的近似技术。我们评估了真实和合成数据集的最佳问题算法,并展示了我们如何获得高的ER准确性,同时显着减少对人类的问题数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号