MR-SimLab: Scalable subgraph selection with label similarity for big data

Dhifli Wajdi; Aridhi Sabeur; Nguifo Engelbert Mephu

首页> 外文期刊>Information Systems >MR-SimLab: Scalable subgraph selection with label similarity for big data

【24h】

MR-SimLab: Scalable subgraph selection with label similarity for big data

机译：MR-SimLab：具有大标签相似性的可扩展子图选择

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the increasing size and complexity of available databases, existing machine learning and data mining algorithms are facing a scalability challenge. In many applications, the number of features describing the data could be extremely high. This hinders or even could make any further exploration infeasible. In fact, many of these features are redundant or simply irrelevant. Hence, feature selection plays a key role in helping to overcome the problem of information overload especially in big data applications. Since many complex datasets could be modeled by graphs of interconnected labeled elements, in this work, we are particularly interested in feature selection for subgraph patterns. In this paper, we propose MR-SimLAB, a MAPREDucE-based approach for subgraph selection from large input subgraph sets. In many applications, it is easy to compute pairwise similarities between labels of the graph nodes. Our approach leverages such rich information to measure an approximate subgraph matching by aggregating the elementary label similarities between the matched nodes. Based on the aggregated similarity scores, our approach selects a small subset of informative representative subgraphs. We provide a distributed implementation of our algorithm on top of the MAPREDUCE framework that optimizes the computational efficiency of our approach for big data applications. We experimentally evaluate MR-SIMLAB on real datasets. The obtained results show that our approach is scalable and that the selected subgraphs are informative. (C) 2017 Elsevier Ltd. All rights reserved.

机译：随着可用数据库的规模和复杂性的增加，现有的机器学习和数据挖掘算法正面临可扩展性挑战。在许多应用中，描述数据的功能数量可能非常多。这阻碍甚至可能使任何进一步的探索都不可行。实际上，许多这些功能是多余的或根本不相关的。因此，特征选择在帮助克服信息超载问题（尤其是大数据应用程序）中起着关键作用。由于许多复杂的数据集可以通过相互连接的标记元素的图来建模，因此在这项工作中，我们对子图模式的特征选择特别感兴趣。在本文中，我们提出了MR-SimLAB，这是一种基于MAPREDucE的方法，用于从大型输入子图集中选择子图。在许多应用中，很容易计算图节点标签之间的成对相似度。我们的方法利用这些丰富的信息，通过汇总匹配节点之间的基本标签相似度来测量近似子图匹配。基于聚合的相似性得分，我们的方法选择了信息性代表性子图的一小部分。我们在MAPREDUCE框架之上提供了算法的分布式实现，可优化我们针对大数据应用的方法的计算效率。我们通过实验对真实数据集评估MR-SIMLAB。获得的结果表明，我们的方法是可扩展的，并且选定的子图具有参考价值。（C）2017 Elsevier Ltd.保留所有权利。

著录项

来源
《Information Systems》 |2017年第9期|155-163|共9页
作者
Dhifli Wajdi; Aridhi Sabeur; Nguifo Engelbert Mephu;
展开▼
作者单位

Univ Evry Val Essonne, Inst Syst & Synthet Biol, F-91030 Evry, France;

Univ Lorraine, LORIA, F-54506 Vandoeuvre Les Nancy, France;

Univ Clermont Auvergne, CNRS, LIMOS, F-63000 Clermont Ferrand, France;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Feature selection; Subgraph mining; Label similarity; MAPREDUCE;

机译：特征选择;子图挖掘;标签相似度;MAPREDUCE;
入库时间 2022-08-18 02:47:41

相似文献

外文文献
中文文献
专利

1. Subgraph Matching with Set Similarity in a Large Graph Database [J] . Hong Liang, Zou Lei, Lian Xiang, Knowledge and Data Engineering, IEEE Transactions on . 2015,第9期

机译：大型图数据库中具有集合相似性的子图匹配
2. Authenticated Subgraph Similarity Searchin Outsourced Graph Databases [J] . Peng Yun, Fan Zhe, Choi Byron, Knowledge and Data Engineering, IEEE Transactions on . 2015,第7期

机译：外包图数据库中经过身份验证的子图相似性搜索
3. Dynamic Top-K Interesting Subgraph Query on Large-Scale Labeled Graphs [J] . Xiaohuan Shan, Chunjie Jia, Linlin Ding, Information . 2019,第2期

机译：大规模标签图的动态Top-K有趣子图查询
4. A Subgraph Query Method Based on Adjacent Node Features on Large-Scale Label Graphs [C] . Xiaohuan Shan, Jingjiao Ma, Jianye Gao, International conference on web information systems and applications . 2019

机译：基于大规模标签图上相邻节点特征的子图查询方法
5. Labeling Large Scale Image Datasets: Exploring Priors, Semantics and Scalability [D] . Jagadeesh, Vignesh 2013

机译：标记大型图像数据集：探索先验，语义和可扩展性
6. Identifying Similar Non-Lattice Subgraphs in Gene Ontology based on Structural Isomorphism and Semantic Similarity of Concept Labels [O] . Rashmie Abeysinghe, Xufeng Qu, Licong Cui 2018

机译：基于概念标签的结构同构和语义相似性识别基因本体中的相似非格子图
7. MR-SimLab: Scalable subgraph selection with label similarity for big data [O] . Dhifli, Wajdi, Aridhi, Sabeur, Mephu Nguifo, Engelbert 2017

机译：MR-SimLab：具有大标签相似性的可扩展子图选择
8. Detection of Buried Targets via Active Selection of Labeled Data: Application to Sensing Subsurface UXO [R] . Carin, L. 2007

机译：通过主动选择标记数据检测埋藏目标：应用于感应地下UXO

MR-SimLab: Scalable subgraph selection with label similarity for big data

摘要

著录项

相似文献

相关主题

期刊订阅