首页> 外文学位 >Query and Mining in Large Graph Databases.
【24h】

Query and Mining in Large Graph Databases.

机译:大型图数据库中的查询和挖掘。

获取原文
获取原文并翻译 | 示例

摘要

Graph has powerful ability to model complex structural relationships among data objects and has been widely used in various applications. Along with the development of the application domains, graph databases become large and are growing rapidly in size. This brings researchers new challenges on graph query and mining, among which we mainly focus on investigating the following three problems: how to find the correspondence between the nodes of two large graphs so that some substructures in one graph are mapped to similar substructures in the other; another problem is how to retrieve similar graphs for a query graph from a graph database consisting of a large number of graphs; and the last problem is how to extract subgraph features to build an automated classification model for a graph database containing graphs which belong to different classes.;In this thesis, for the first problem, we propose a novel two-step approach which can efficiently match two large graphs over thousands of nodes with high matching quality. In the first stage, we design an anchor-selection/expansion scheme to construct a good initial matching heuristically. In the second stage, we propose a new approach to refine the initial matching and give the optimality of our refinement algorithm. Our approach can produce an approximate matching result with high quality and efficiency. To address the second problem, we introduce a new graph distance measure based on the maximum common subgraphs (MCS) of two graphs which can thoroughly capture the common as well as different structures of two graphs. Since computing the MCS of two graphs is NP-complete, to answer the top-k graph similarity query efficiently, we propose a fast algorithm which can significantly reduce the number of MCS computations. This algorithm prunes the unqualified graphs based on three lower bounds in which the first two are derived based on the structures of two graphs and the third is obtained based on the triangle property of the distance measure. Three index schemes are designed with different tradeoffs between pruning power and construction cost to assist the query processing. For the third problem, we identify two main issues of the current widely-used discriminative score for feature selection, and introduce a new diversified discriminative score to explore the additional value of the diversity together with the discriminativity. We analyze the properties of the newly-proposed diversified discriminative score from several perspectives and demonstrate that this score can make positive/negative graphs more separable. New algorithms are also proposed to select features based on the new score and they are shown to have high classification accuracy.
机译:图具有强大的能力来建模数据对象之间的复杂结构关系,并已广泛用于各种应用程序中。随着应用程序域的发展,图形数据库变得庞大并且规模迅速增长。这给研究人员带来了图查询和挖掘方面的新挑战,其中我们主要集中在研究以下三个问题:如何找到两个大图的节点之间的对应关系,以便一个图中的某些子结构映射到另一个图中的相似子结构;另一个问题是如何从由大量图形组成的图形数据库中检索查询图形的相似图形。最后一个问题是如何提取子图特征,为包含不同类别图的图数据库建立一个自动分类模型。本文针对第一个问题,提出了一种新颖的两步法,该方法可以有效地进行匹配。匹配质量高的数千个节点上的两个大图。在第一阶段,我们设计了锚点选择/扩展方案,以启发式地构造良好的初始匹配。在第二阶段,我们提出了一种新的方法来优化初始匹配并给出优化算法的最优性。我们的方法可以产生高质量和高效率的近似匹配结果。为了解决第二个问题,我们基于两个图的最大公共子图(MCS)引入了一种新的图距离度量,它可以彻底捕获两个图的公共以及不同结构。由于计算两个图的MCS是NP完全的,为了有效地回答top-k图相似性查询,我们提出了一种可显着减少MCS计算数量的快速算法。该算法基于三个下限修剪不合格的图,其中前两个是根据两个图的结构导出的,而下三个是基于距离度量的三角形属性而获得的。设计了三种索引方案,它们在修剪能力和建筑成本之间具有不同的权衡,以辅助查询处理。对于第三个问题,我们确定了当前在特征选择中广泛使用的判别分数的两个主要问题,并引入了一个新的判别分数以探索多样性的附加价值以及判别力。我们从几个角度分析了新提出的多样化判别分数的性质,并证明了该分数可以使正/负图更可分离。还提出了新算法来基于新分数选择特征,并且显示出它们具有高分类精度。

著录项

  • 作者

    Zhu, Yuanyuan.;

  • 作者单位

    The Chinese University of Hong Kong (Hong Kong).;

  • 授予单位 The Chinese University of Hong Kong (Hong Kong).;
  • 学科 Engineering System Science.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 161 p.
  • 总页数 161
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号