...
【24h】

Graph Traversal Edit Distance and Extensions

机译:图遍历编辑距离和扩展

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this article, we give a new graph kernel, which we call graph traversal edit distance (GTED). We introduce the GTED problem and give the first polynomial time algorithm for it. Informally, the GTED is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. Also, GTED is motivated by and provides the first mathematical formalism for sequence co-assembly and de novo variation detection in bioinformatics. We demonstrate that GTED admits a polynomial time algorithm using a linear program in the graph product space that is guaranteed to yield an integer solution. To the best of our knowledge, this is the first approach to this problem. We also give a linear programming relaxation algorithm for a lower bound on GTED. We use GTED as a graph kernel and evaluate it by computing the accuracy of a support vector machine (SVM) classifier on a few data sets in the literature. Our results suggest that our kernel outperforms many of the common graph kernels in the tested data sets. As a second set of experiments, we successfully cluster viral genomes using GTED on their assembly graphs obtained from de novo assembly of next-generation sequencing reads.
机译:应用机器学习协议的许多问题与图形(也称为网络),包括社交网络,安全性,Web数据挖掘,蛋白质功能预测和基因组信息学。内核范例精美地从底层几何空间解耦了学习算法,该空间呈现图形内核对上述应用的重要性。在本文中,我们提供了一个新的图形内核,我们调用图形遍历编辑距离(GTED)。我们介绍了GTED问题,并为其提供了第一种多项式时间算法。非正式地,GTED是由两个图形的各个欧拉遍历的边缘标签形成的两个字符串之间的最小编辑距离。此外,GTED通过并提供了第一个数学形式主义,用于序列共同组装和生物信息学中的DE Novo变异检测。我们证明GTED在图表产品空间中使用线性程序承认了多项式时间算法,以产生整数解决方案。据我们所知,这是第一种解决这个问题的方法。我们还提供了一个线性编程放松算法,用于GTED上的下限。我们使用GTED作为图形内核,并通过计算文献中的一些数据集上的支持向量机(SVM)分类器的准确性来评估它。我们的结果表明,我们的内核在测试的数据集中赢得了许多公共图形内核。作为第二组实验,我们成功使用从De Novo集合读取的De Novo组装获得的组装图上使用GTED进行群集病毒基因组。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号