Recent decades witnessed a rapid proliferation of graph data, such as chemicalstructures and business processes. Structural queries are frequently issued in thesedomains, and hence, attract extensive attention. Due to data inconsistency andnoise, a recent trend is to study similarity queries. Moreover, graphs may possessmultiple attributes on vertices, imposing additional challenges. Hence, indexingand processing methods for answering advanced structural queries are strongly indemand.The thesis studies three types of emerging structural queries in large-scale graphdatabases, including graph similarity join, graph similarity search and graph substructureselection.Graph similarity join retrieves all similarity pairs from two graph collectionssuch that their similarity satisfies a given threshold. Particularly, we incorporategraph edit distance as similarity measure. Inspired by the q-gram idea for stringsimilarity queries, we introduce path-based q-grams on graphs, and establish acount filtering lower bound for generating candidates. An efficient algorithm isproposed to exploit both matching and mismatching features. We also presentlabel and degree-based matching conditions to strengthen the filtering techniques.Leveraging offline index, we investigate efficient graph similarity search. Existingsolutions employ fixed-size overlapping substructures, and thus, are susceptible to large vertex degrees and thresholds. Disparately, we present a partition-basedapproach by dividing graphs into variable-size non-overlapping partitions. The editdistance constraint is converted to a containment constraint for candidate generation.A dynamic pruning technique and an improved verification algorithm arealso developed. In addition, a cost-aware graph partitioning technique is conceivedto optimize the index.Equipping structural queries with rich semantics, substructure selection overmulti-attributed graphs finds all subgraph isomorphic mappings such that each pairof matching vertices satisfy a set of selection conditions, each against an equality,range, or set containment operator on a vertex attribute. Existing techniquesfor single-labeled graphs are developed under the assumption of identical labelmatching, and thus, cannot handle the general cases. To this end, we proposea two-tier index via judiciously materializing feature mappings. We also devisea scalable indexing and query processing method exploiting execution plans withminimum cost.
展开▼