Explainable Similarity of Datasets Using Knowledge Graph

机译：使用知识图的数据集的可解释相似性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

There is a large quantity of datasets available as Open Data on the Web. However, it is challenging for users to find datasets relevant to their needs, even though the datasets are registered in catalogs such as the European Data Portal. This is because the available metadata such as keywords or textual description is not descriptive enough. At the same time, datasets exist in various types of contexts not expressed in the metadata. These may include information about the dataset publisher, the legislation related to dataset publication, language and cultural specifics, etc. In this paper we introduce a similarity model for matching datasets. The model assumes an ontology/knowledge graph, such as Wikidata.org, that serves as a graph-based context to which individual datasets are mapped based on their metadata. A similarity of the datasets is then computed as an aggregation over paths among nodes in the graph. The proposed similarity aims at addressing the problem of explainability of similarity, i.e., providing the user a structured explanation of the match which, in a broader sense, is nowadays a hot topic in the field of artificial intelligence.

机译：Web上有大量可用的数据集作为“开放数据”。但是，即使数据集已注册在诸如“欧洲数据门户”之类的目录中，用户也很难找到与他们的需求相关的数据集。这是因为可用的元数据（例如关键字或文本描述）描述性不足。同时，数据集存在于元数据中未表达的各种类型的上下文中。这些可能包括有关数据集发布者的信息，与数据集发布有关的法规，语言和文化特征等。在本文中，我们介绍了一个用于匹配数据集的相似性模型。该模型假设一个本体/知识图，例如Wikidata.org，它用作基于图的上下文，基于该图的上下文，各个数据集基于其元数据进行映射。然后，将数据集的相似度计算为图形中节点之间路径上的聚合。所提出的相似性旨在解决相似性的可解释性问题，即，为用户提供对匹配的结构化解释，从广义上讲，该匹配是当今在人工智能领域中的热门话题。

著录项

来源
《International Conference on Similarity Search and Applications》|2019年|103-110|共8页
会议地点 Newark(US)
作者
Petr Skoda; Jakub Klfmek; Martin Necasky; Tomas Skopal;
展开▼
作者单位

Department of Software Engineering Faculty of Mathematics and Physics Charles University Malostranske namesti 25 118 00 Praha 1 Czech Republic;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Similarity; Datasets; Search; Graph;

机译：相似;数据集搜索;图形;

相似文献

外文文献
中文文献
专利

1. Template edge similarity graph clustering for mining multiple gene expression datasets [J] . Salem Saeed International journal of data mining and bioinformatics . 2017,第1期

机译：模板边缘相似性图形聚类用于挖掘多基因表达式数据集
2. Hybrid coexpression link similarity graph clustering for mining biological modules from multiple gene expression datasets [J] . Saeed Salem, Cagri Ozcaglar BioData Mining . 2014,第1期

机译：混合共表达链接相似度图聚类，用于从多个基因表达数据集中挖掘生物模块
3. Content-based Union and Complement Metrics for Dataset Search over RDF Knowledge Graphs [J] . ACM journal of data and information quality . 2020,第2期

机译：基于内容的联合和补充指标，用于数据集搜索RDF知识图表
4. Explainable Similarity of Datasets Using Knowledge Graph [C] . Petr Skoda, Jakub Klfmek, Martin Necasky, International Conference on Similarity Search and Applications . 2019

机译：使用知识图来解释数据集的相似性
5. Hashing Based Similarity Search over Massive Datasets [D] . Li, Jinfeng. 2018

机译：基于哈希的大规模数据集相似度搜索
6. Hybrid coexpression link similarity graph clustering for mining biological modules from multiple gene expression datasets [O] . Saeed Salem, Cagri Ozcaglar 2014

机译：混合共表达链接相似度图聚类用于从多个基因表达数据集中挖掘生物模块
7. Making Explainable Friend Recommendations Based on Concept Similarity Measurements via a Knowledge Graph [O] . Shaohua Tao, Runhe Qiu, Yuan Ping, 2020

机译：通过知识图来制定基于概念相似度测量的可解释的朋友推荐

Explainable Similarity of Datasets Using Knowledge Graph

摘要

著录项

相似文献

相关主题

期刊订阅