Query and Mining in Large Graph Databases.

机译：大型图数据库中的查询和挖掘。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graph has powerful ability to model complex structural relationships among data objects and has been widely used in various applications. Along with the development of the application domains, graph databases become large and are growing rapidly in size. This brings researchers new challenges on graph query and mining, among which we mainly focus on investigating the following three problems: how to find the correspondence between the nodes of two large graphs so that some substructures in one graph are mapped to similar substructures in the other; another problem is how to retrieve similar graphs for a query graph from a graph database consisting of a large number of graphs; and the last problem is how to extract subgraph features to build an automated classification model for a graph database containing graphs which belong to different classes.;In this thesis, for the first problem, we propose a novel two-step approach which can efficiently match two large graphs over thousands of nodes with high matching quality. In the first stage, we design an anchor-selection/expansion scheme to construct a good initial matching heuristically. In the second stage, we propose a new approach to refine the initial matching and give the optimality of our refinement algorithm. Our approach can produce an approximate matching result with high quality and efficiency. To address the second problem, we introduce a new graph distance measure based on the maximum common subgraphs (MCS) of two graphs which can thoroughly capture the common as well as different structures of two graphs. Since computing the MCS of two graphs is NP-complete, to answer the top-k graph similarity query efficiently, we propose a fast algorithm which can significantly reduce the number of MCS computations. This algorithm prunes the unqualified graphs based on three lower bounds in which the first two are derived based on the structures of two graphs and the third is obtained based on the triangle property of the distance measure. Three index schemes are designed with different tradeoffs between pruning power and construction cost to assist the query processing. For the third problem, we identify two main issues of the current widely-used discriminative score for feature selection, and introduce a new diversified discriminative score to explore the additional value of the diversity together with the discriminativity. We analyze the properties of the newly-proposed diversified discriminative score from several perspectives and demonstrate that this score can make positive/negative graphs more separable. New algorithms are also proposed to select features based on the new score and they are shown to have high classification accuracy.

机译：图具有强大的能力来建模数据对象之间的复杂结构关系，并已广泛用于各种应用程序中。随着应用程序域的发展，图形数据库变得庞大并且规模迅速增长。这给研究人员带来了图查询和挖掘方面的新挑战，其中我们主要集中在研究以下三个问题：如何找到两个大图的节点之间的对应关系，以便一个图中的某些子结构映射到另一个图中的相似子结构;另一个问题是如何从由大量图形组成的图形数据库中检索查询图形的相似图形。最后一个问题是如何提取子图特征，为包含不同类别图的图数据库建立一个自动分类模型。本文针对第一个问题，提出了一种新颖的两步法，该方法可以有效地进行匹配。匹配质量高的数千个节点上的两个大图。在第一阶段，我们设计了锚点选择/扩展方案，以启发式地构造良好的初始匹配。在第二阶段，我们提出了一种新的方法来优化初始匹配并给出优化算法的最优性。我们的方法可以产生高质量和高效率的近似匹配结果。为了解决第二个问题，我们基于两个图的最大公共子图（MCS）引入了一种新的图距离度量，它可以彻底捕获两个图的公共以及不同结构。由于计算两个图的MCS是NP完全的，为了有效地回答top-k图相似性查询，我们提出了一种可显着减少MCS计算数量的快速算法。该算法基于三个下限修剪不合格的图，其中前两个是根据两个图的结构导出的，而下三个是基于距离度量的三角形属性而获得的。设计了三种索引方案，它们在修剪能力和建筑成本之间具有不同的权衡，以辅助查询处理。对于第三个问题，我们确定了当前在特征选择中广泛使用的判别分数的两个主要问题，并引入了一个新的判别分数以探索多样性的附加价值以及判别力。我们从几个角度分析了新提出的多样化判别分数的性质，并证明了该分数可以使正/负图更可分离。还提出了新算法来基于新分数选择特征，并且显示出它们具有高分类精度。

著录项

作者
Zhu, Yuanyuan.;
展开▼
作者单位

The Chinese University of Hong Kong (Hong Kong).;

展开▼
授予单位 The Chinese University of Hong Kong (Hong Kong).;
学科 Engineering System Science.
学位 Ph.D.
年度 2013
页码 161 p.
总页数 161
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Mining Related Queries from Web Search Engine Query Logs Using an Improved Association Rule Mining Model [J] . Xiaodong Shi, Christopher C. Yang Journal of the American Society for Information Science and Technology . 2007,第12期

机译：使用改进的关联规则挖掘模型从Web搜索引擎查询日志中挖掘相关查询
2. BioQuery: an object framework for building queries to biomedical databases. [J] . Brundege JM, Dubay C Bioinformatics . 2003,第7期

机译：BioQuery：一个对象框架，用于建立对生物医学数据库的查询。
3. Mining Query Plans for Finding Candidate Queries and Sub-Queries for Materialized Views in BI Systems Without Cube Generation [J] . Atul Thakare, Srijay Deshpande, Amit Kshirsagar, Computing and informatics . 2019,第2期

机译：挖掘查询计划，以在没有多维数据集生成的BI系统中查找物化视图的候选查询和子查询
4. Mining Large Query Induced Graphs towards a Hierarchical Query Folksonomy [C] . Alexandre P. Francisco, Ricardo Baeza-Yates, Arlindo L. Oliveira String processing and information retrieval . 2010

机译：向分层查询Folksonomy挖掘大型查询诱导图
5. QP-Subdue: Processing queries over graph databases. [D] . Goyal, Ankur. 2015

机译：QP-Subdue：处理图形数据库上的查询。
6. A security architecture for query tools used to access large biomedical databases. [O] . Shawn N. Murphy, Henry C. Chueh 2002

机译：用于访问大型生物医学数据库的查询工具的安全体系结构。
7. Enhancing association rules algorithms for mining distributed databases. Integration of fast BitTable and multi-agent association rules mining in distributed medical databases for decision support. [O] . Abdo Walid Adly Atteya 2012

机译：增强用于挖掘分布式数据库的关联规则算法。快速BitTable和多代理关联规则挖掘在分布式医疗数据库中的集成，以提供决策支持。
8. Comparison of SQL, QBE, and DFQL as Query Languages for Relational Databases. [R] . Girsang, P. 1994

机译：sQL，QBE和DFQL作为关系数据库查询语言的比较。

Query and Mining in Large Graph Databases.

摘要

著录项

相似文献

相关主题

期刊订阅