Mining and indexing graph databases.

机译：挖掘和索引图形数据库。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphs are widely used to model structures and relationships of objects in various scientific and commercial fields. Chemical molecules, proteins, malware system-call dependencies and three-dimensional mechanical parts are all modeled as graphs. In this dissertation, we propose to mine and index those graph data to enable fast and scalable search.;There are two common search scenarios: subgraph search and supergraph search. In a subgraph search, given a query graph q, the algorithm searches for all graphs that have q as a subgraph, from a graph database. A supergraph search, on the other hand, retrieves all the database graphs that have q as a supergraph. Determining whether a graph is a subgraph (or supergraph) of another is an NP-complete problem. Hence, it is challenging to efficiently process graph queries on large databases. Graph indices are commonly built to fast process graph queries. Subgraph patterns are mined from the graph database to build such graph indices. It has been shown that both the index structure of the graph indices and the subgraph patterns that are selected for indexing determine the time of processing graph queries. We address the graph search problem by designing innovative index structures and proposing new subgraph pattern mining algorithms.;First, we propose an index structure, Lindex, which organizes subgraph patterns in a lattice. As a novel index structure, Lindex (1) is effective in filtering false graphs, (2) provides fast index lookups, (3) is fast with respect to index construction and maintenance, and (4) can be constructed using any set of substructure index patterns. These four properties result in a fast and scalable graph-querying infrastructure. Lindex is compatible with any choice of patterns. Empirically, we demonstrate that Lindex used in conjunction with subgraph patterns proposed in previous works outperforms other specifically designed index structures.;Second, we address the pattern mining problem by modeling it as a combinatory optimization problem. Instead of mining subgraph patterns based on heuristics as in previous works, we proposed a greedy algorithm, which mines index patterns that minimize the query-processing time with theoretical guarantee. Specifically for the supergraph search, previous algorithms are proposed to optimize either a filtering gain or a prefix-sharing gain. No single subgraph-mining algorithm considers both gains, although these two gains jointly model the savings of the query-processing time. We are the first one to mine subgraph patterns to optimize both the filtering gain and the prefix-sharing gain.;Finally, we study the update of graph indices. All previous algorithms are mine-at-once algorithms in which the whole set of index patterns is mined in one run before building a graph index. Because of the change of environments, such as an update of the graph database and the increase of available memory, the index needs to be updated to accommodate such changes. Most of the mine-at-once algorithms are time-consuming because they involve frequent subgraph or subtree mining over the whole graph database. Also, constructing and deploying a new index involves expensive disk operations. Hence, it is inefficient to re-mine the patterns and rebuild the index from scratch. We propose an iterative mining algorithm and a one-pass mining algorithm to address the index-update problem. Both algorithms replace an old index pattern p' with a newly selected subgraph pattern p until the decrease of the query-processing time caused by the swap is lower than a threshold. The iterative mining algorithm always mines the new subgraph pattern p that maximizes the decrease of the query-processing time. And it can achieve an approximation ratio of 1 - 1/e. The one-pass mining algorithm, on the other hand, can choose any pattern p that meets a criterion f(p, p ') > 0, where the function f(·) measures the advantage of indexing p over p '. The one-pass mining algorithm have an approximation ratio of 1/4 with faster running-time than iterative mining..;We conduct extensive empirical studies to study the performance of our proposed algorithms. The experiment results demonstrate the effectiveness of our algorithms by showing their improvement over the state-of-the-art approaches over existing benchmark datasets.

机译：图在各种科学和商业领域中广泛用于建模对象的结构和关系。化学分子，蛋白质，恶意软件的系统调用依存关系和三维机械零件都被建模为图形。本文提出对这些图数据进行挖掘和索引，以实现快速，可扩展的搜索。有两种常见的搜索场景：子图搜索和上图搜索。在子图搜索中，给定查询图q，算法从图数据库中搜索所有以q为子图的图。另一方面，超级图搜索将检索所有以q为超级图的数据库图。确定图是否是另一个图的子图（或超图）是一个NP完全问题。因此，在大型数据库上有效地处理图形查询具有挑战性。图形索引通常用于快速处理图形查询。从图形数据库中挖掘子图形模式以构建此类图形索引。已经表明，图索引的索引结构和选择用于索引的子图模式都确定了处理图查询的时间。我们通过设计创新的索引结构并提出新的子图模式挖掘算法来解决图搜索问题。首先，我们提出一种索引结构Lindex，该索引结构将子图模式组织在一个网格中。作为一种新颖的索引结构，Lindex（1）可有效过滤错误图，（2）提供快速的索引查找，（3）索引构建和维护方面的速度很快，并且（4）可以使用任何子结构集进行构建索引模式。这四个属性构成了快速且可扩展的图形查询基础结构。 Lindex与任何模式选择兼容。从经验上讲，我们证明了结合先前工作中提出的子图模式使用的Lindex优于其他专门设计的索引结构。第二，我们将模式挖掘问题建模为组合优化问题，从而解决了模式挖掘问题。与其像以前的工作那样基于启发式挖掘子图模式，我们提出了一种贪婪算法，该算法挖掘索引模式以在理论上保证最小化查询处理时间。特别是对于超图搜索，提出了先前的算法来优化滤波增益或前缀共享增益。尽管这两个增益共同模拟了查询处理时间的节省，但是没有一个子图挖掘算法会同时考虑这两个增益。我们是第一个挖掘子图模式以同时优化滤波增益和前缀共享增益的人。最后，我们研究了图索引的更新。以前的所有算法都是一次挖掘算法，其中在建立图形索引之前一次运行要挖掘整个索引模式集。由于环境的变化（例如图形数据库的更新和可用内存的增加），因此需要更新索引以适应这种变化。大多数一次挖掘算法都很耗时，因为它们涉及整个图数据库中频繁的子图或子树挖掘。同样，构造和部署新索引涉及昂贵的磁盘操作。因此，重新挖掘模式并从头开始重建索引效率很低。我们提出了一种迭代挖掘算法和一个单次挖掘算法来解决索引更新问题。两种算法都用新选择的子图模式p替换了旧的索引模式p'，直到由交换引起的查询处理时间的减少低于阈值。迭代挖掘算法始终会挖掘新的子图模式p，以最大程度地减少查询处理时间。可以达到1-1 / e的近似比率。另一方面，单程挖掘算法可以选择满足标准f（p，p'）> 0的任何模式p，其中函数f（·）衡量索引p优于p'的优势。单遍挖掘算法的近似比率为1/4，运行时间比迭代挖掘快。.;我们进行了广泛的经验研究，以研究所提出算法的性能。实验结果通过展示它们相对于现有基准数据集的最新方法的改进来证明我们算法的有效性。

著录项

作者
Yuan, Dayu.;
展开▼
作者单位

The Pennsylvania State University.;

展开▼
授予单位 The Pennsylvania State University.;
学科 Computer science.;Information science.
学位 Ph.D.
年度 2013
页码 133 p.
总页数 133
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Genome analysis with gene-indexing databases. [J] . Yuan J, Liu Y, Wang Y, Pharmacology and Therapeutics: The Journal of the International Encyclopedia of Pharmacology and Therapeutics . 2001,第2期

机译：使用基因索引数据库进行基因组分析。
2. Population groups: indexing, coverage, and retrieval effectiveness of ethnically related health care issues in health sciences databases. [J] . E N Efthimiadis, M Afifi Bulletin of the Medical Library Association. . 1996,第3期

机译：人群：健康科学数据库中与种族相关的医疗保健问题的索引编制，覆盖范围和检索效力。
3. Data pre-processing to improve the mining of large feed databases. [J] . Maroto-Molina F., Gomez-Cabrera A., Guerrero-Ginel J. E., Animal . 2013,第7期

机译：数据预处理可改善大型供稿数据库的挖掘。
4. Mining and Indexing Graphs for Supergraph Search [C] . Dayu Yuan, Prasenjit Mitra, C. Lee Giles International conference on very large data bases . 2013

机译：超级图搜索的挖掘和索引图
5. Indexing and mining multimedia databases. [D] . Korn, Philip Russell (Flip). 1998

机译：索引和挖掘多媒体数据库。
6. From University Matriculation Registers to Historical Personal Databases. Digitisation, Indexing and Prosopographic Classification [O] . Ingo Runde 2021

机译：从大学预留寄存器到历史个人数据库。数字化，索引和勘探分类

Mining and indexing graph databases.

摘要

著录项

相似文献

相关主题

期刊订阅