首页> 外文会议>2016 IEEE 32nd International Conference on data Engineering Workshops >Playing LEGO with JSON: Probabilistic joins over attribute-value fragments
【24h】

Playing LEGO with JSON: Probabilistic joins over attribute-value fragments

机译:使用JSON玩乐高玩具:概率结合属性值片段

获取原文
获取原文并翻译 | 示例

摘要

Information about an entity can hardly be assumed to be given in one single document, created in a single instance of time. Rather, it is reasonable to assume that information is spread over multiple documents and created/enriched over time¿¿¿for instance through crowdsourcing facts or mined from social network streams, one after the other. In this work, we consider the problem of assembling entity-centric information out of input comprising small pieces of information; provided in form of JSON document snippets. The final goal is to create a document that (possibly fully) describes an entity by putting related fragments together. What makes this task challenging is the lack of evidence telling which fragments belong together and, hence, can be safely combined. We focus on deciding this question using statistics of the already seen fragments, to justify if a join is reasonable or not. We evaluate our approach using real-world datasets and show that we can achieve high precision and recall.
机译:关于实体的信息几乎不能认为是在单个时间实例中创建的单个文档中给出的。相反,可以合理地假设信息会散布在多个文档中,并随着时间的推移而创建/丰富(例如通过众包事实或从社交网络流中一个接一个地挖掘)。在这项工作中,我们考虑了从以小信息组成的输入中组装以实体为中心的信息的问题。以JSON文档摘要的形式提供。最终目标是通过将相关的片段放在一起,创建一个(可能是完整的)描述实体的文档。使这项任务具有挑战性的原因是缺乏证据说明哪些片段属于同一片段,因此可以安全地合并。我们专注于使用已经看到的片段的统计数据来决定这个问题,以证明联接是否合理。我们使用实际数据集评估了我们的方法,并表明我们可以实现高精度和召回率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号