首页> 外文学位 >Stochastic graph grammars: Parameter learning and applications.
【24h】

Stochastic graph grammars: Parameter learning and applications.

机译:随机图语法:参数学习和应用。

获取原文
获取原文并翻译 | 示例

摘要

Although traditional data mining algorithms often assume the data to have a feature-vector representation, emerging domains for data mining such as bioinformatics and social networks require the underlying data to be represented as graphs. In this thesis, we present stochastic graph grammars as probability models suitable for mining graph databases.;We first consider the problem of parameter learning of stochastic graph grammars from training data, and observe that the only known solution to this problem, PEGG (Parameter Estimation for Graph Grammars), relies on the bottom up parsing of the training instances. Parsing can be prohibitively expensive for many real grammars. To overcome this difficulty, we present the QuickPEGG algorithm for learning the grammar parameters more efficiently. QuickPEGG utilizes the fact that for most grammars, the probability distributions of the number of nodes and the number of edges of the underlying graph distribution contain sufficient information to infer the grammar parameters, thus rendering parsing unnecessary for parameter learning.;We then consider applications of the learned grammar to knowledge discovery in the underlying domain. In the existing literature, most applications of graph grammars are dominated by sampling and parsing. However, we show that once the stochastic graph grammar has been learned, interesting statistics such as the degree distribution may be inferred from it. Because of this property, stochastic graph grammars may be used as generative models for useful families of networks, such as scale-free networks where the degree of a node follows a power law distribution.;In summary, our work establishes stochastic graph grammars as a framework suitable for mining graph databases; the algorithms presented in this thesis improve our ability to efficiently learn grammar parameters from training data. The applications shown in this thesis enable us to obtain a deeper understanding of the underlying graph distribution, once the grammar has been learned.
机译:尽管传统的数据挖掘算法通常假定数据具有特征向量表示,但新兴的数据挖掘领域(如生物信息学和社交网络)要求将基础数据表示为图形。在本文中,我们将随机图文法作为适合于挖掘图数据库的概率模型。我们首先考虑从训练数据中随机图文法的参数学习问题,并观察到解决该问题的唯一已知方法是PEGG(参数估计) (针对图形语法),则依赖于训练实例的自底向上解析。对于许多真实的语法而言,解析可能会非常昂贵。为了克服这个困难,我们提出了QuickPEGG算法,用于更有效地学习语法参数。 QuickPEGG利用了这样一个事实,即对于大多数语法而言,节点数的概率分布和基础图分布的边缘数包含足够的信息来推断语法参数,从而使得解析对于参数学习而言是不必要的;基础知识领域中知识发现的学习语法。在现有文献中,图文法的大多数应用主要由采样和解析构成。但是,我们表明,一旦学习了随机图文法,就可以从中推断出有趣的统计数据,例如度数分布。由于这种特性,随机图文法可以用作有用网络族的生成模型,例如节点规模遵循幂律分布的无标度网络;总而言之,我们的工作将随机图文法建立为适用于挖掘图数据库的框架;本文提出的算法提高了我们从训练数据中有效学习语法参数的能力。一旦学习了语法,本文中显示的应用程序使我们能够对底层图形分布有更深入的了解。

著录项

  • 作者

    Mukherjee, Sourav.;

  • 作者单位

    University of Maryland, Baltimore County.;

  • 授予单位 University of Maryland, Baltimore County.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 126 p.
  • 总页数 126
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号