首页> 外文会议>International Conference on Tools with Artificial Intelligence >An Experimental Study on Learning with Good Edit Similarity Functions
【24h】

An Experimental Study on Learning with Good Edit Similarity Functions

机译:良好编辑相似函数学习的实验研究

获取原文

摘要

Similarity functions are essential to many learning algorithms. To allow their use in support vector machines (SVM), i.e., for the convergence of the learning algorithm to be guaranteed, they must be valid kernels. In the case of structured data, the similarities based on the popular edit distance often do not satisfy this requirement, which explains why they are typically used with k-nearest neighbor (k-NN). A common approach to use such edit similarities in SVM is to transform them into potentially (but not provably) valid kernels. Recently, a different theory of learning with (e,g,t) -good similarity functions was proposed, allowing the use of non-kernel similarity functions. Moreover, the resulting models are supposedly sparse, as opposed to standard SVM models that can be unnecessarily dense. In this paper, we study the relevance and applicability of this theory in the context of string edit similarities. We show that they are naturally good for a given string classification task and provide experimental evidence that the obtained models not only clearly outperform the k-NN approach, but are also competitive with standard SVM models learned with state-of-the-art edit kernels, while being much sparser.
机译:相似性功能对于许多学习算法至关重要。为了允许其在支持向量机(SVM)中使用,即,对于要保证的学习算法的融合,它们必须是有效的内核。在结构化数据的情况下,基于流行的编辑距离的相似性通常不满足此要求,这解释了它们通常与k最近邻(K-Nn)一起使用的原因。使用SVM中使用这种编辑相似性的常见方法是将它们转换为潜在的(但不是可证明的)有效内核。最近,提出了一种与(e,g,t)的不同学习理论,允许使用非核相似性功能。此外,所产生的模型被认为是稀疏的,而不是可以不必要地密集的标准SVM模型。在本文中,我们在串编辑相似性的背景下研究了这个理论的相关性和适用性。我们发现,它们对于给定的字符串分类任务自然好,并提供实验证据表明,所获得的模型不仅明确跑赢K-NN方法,但也有竞争力与国家的最先进的编辑仁学标准SVM模型,虽然有很多稀疏。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号