首页> 外文会议>European Semantic Web Conference >Natural Language Inference over Tables:Enabling Explainable Data Exploration on Data Lakes
【24h】

Natural Language Inference over Tables:Enabling Explainable Data Exploration on Data Lakes

机译:桌子上的自然语言推断:在数据湖泊中启用可解释的数据探索

获取原文

摘要

Data lakes are repositories of data with potential for analysis. Data lakes aim to liberate data from silos, thereby enabling cross-cutting analyses that were hitherto out of reach. This gives rise to significant challenges for data scientists simply discovering what data sets may be relevant to a task-in-hand. Given a data set of interest, several proposals have been made for indexing schemes that can identify related data sets. However, such schemes tend to build on similarity metrics that stop short of providing a clear explanation as to how an identified data set relates to a provided target. We address this problem by applying Natural Language Inference (NLI) to providing explanations as to how the attributes of discovered data sets relate to those of the target, in terms of a collection of semantic relations. We provide two approaches to inferring semantic relations: (a) by performing unsupervised intensional and extensional analysis of the data sources using Natural Language Processing techniques; and (b) by performing supervised learning of semantic relations by applying BERT over source schema information. The contributions of this paper are: an NLI strategy for providing explicit characterisation of semantic relations between data sets: two approaches to inferring the semantic relations; and an empirical evaluation of the approaches using open government data.
机译:数据湖泊是具有分析潜力的数据的存储库。数据湖泊的目的是从筒仓中解放数据,从而使迄今为止的交叉切割分析。这引起了对数据科学家的重大挑战,即可发现数据集可能与手中相关的数据集。鉴于感兴趣的数据集,已经为可以识别相关数据集的索引方案进行了若干建议。然而,这种方案倾向于构建相似度量,该测量值不允许提供关于如何识别的数据集如何涉及提供的目标的明确说明。我们通过应用自然语言推断(NLI)来解决这个问题,以便在一个语义关系的集合方面提供关于发现数据集的属性如何与目标的属性相关的解释。我们提供了推断语义关系的两种方法:(a)通过使用自然语言处理技术对数据源进行无监督的密集和扩展分析; (b)通过在源模式信息中应用伯特来执行语义关系的监督学习。本文的贡献是:一个NLI战略,用于提供数据集之间的语义关系的明确表征:推断语义关系的两种方法;和使用公开政府数据的方法的实证评价。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号