【24h】

Personalised Exploration Graphs on Semantic Data Lakes

机译:语义数据湖上的个性化探索图

获取原文

摘要

Recently, organisations operating in the context of Smart Cities are spending time and resources in turning large amounts of data, collected within heterogeneous sources, into actionable insights, using indicators as powerful tools for meaningful data aggregation and exploration. Data lakes, which follow a schema-on-read approach, allow for storing both structured and unstructured data and have been proposed as flexible repositories for enabling data exploration and analysis over heterogeneous data sources, regardless their structure. However, indicators are usually computed based on the centralisation of the data storage, according to a less flexible schema on write approach. Furthermore, domain experts, who know data stored within the data lake, are usually distinct from data analysts, who define indicators, and users, who exploit indicators to explore data in a personalised way. In this paper, we propose a semantics-based approach for enabling personalised data lake exploration through the conceptualisation of proper indicators. In particular, the approach is structured as follows: (ⅰ) at the bottom, heterogeneous data sources within a data lake are enriched with Semantic Models, defined by domain experts using domain ontologies, to provide a semantic data lake representation; (ⅱ) in the middle, a Multi-Dimensional Ontology is used by analysts to define indicators and analysis dimensions, in terms of concepts within Semantic Models and formulas to aggregate them; (ⅲ) at the top, Personalised Exploration Graphs are generated for different categories of users, whose profiles are defined in terms of a set of constraints that limit the indicators instances on which the users may rely to explore data. Benefits and limitations of the approach are discussed through an application in the Smart City domain.
机译:最近,在智慧城市环境中运作的组织正在花费时间和资源,将指标中的强大数据作为有意义的数据汇总和探索工具,将异构源中收集的大量数据转化为可行的见解。遵循读取模式架构的数据湖允许存储结构化数据和非结构化数据,并且已被提议作为灵活的存储库,用于对异构数据源进行数据探索和分析,无论其结构如何。但是,通常根据不太灵活的写操作模式,根据数据存储的集中度来计算指标。此外,了解存储在数据湖中的数据的领域专家通常与定义指标的数据分析师和使用指标以个性化方式探索数据的用户不同。在本文中,我们提出了一种基于语义的方法,可以通过适当指标的概念化来实现个性化的数据湖探索。特别地,该方法的结构如下:(ⅰ)在底部,数据湖中的异构数据源通过由领域专家使用域本体定义的语义模型进行充实,以提供语义数据湖表示; (ⅱ)在中间,分析人员使用多维本体根据语义模型中的概念和公式将指标和分析维度定义为汇总指标; (ⅲ)在顶部,针对不同类别的用户生成了个性化探索图,其个人资料是根据一组限制条件定义的,这些限制条件限制了用户可能依赖于探索数据的指标实例。通过智能城市领域中的应用程序讨论了该方法的优点和局限性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号